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ABSTRACT 


This  work  begins  with  a  study  of  individual  decision-making 
under  uncertainty^  a  problQn''V!hich  we  formulate  as 

(l)  Maximize  f(x,P)  subje^  to  ^0,  i  =  1,  .  .  .  ,  m  , 

where  x  is  a  decision  n- vector,  /p  is  a  b-vector  of  exogenous 
variables  and  parameters  of  the  decision  model,  f  is  an  objective 
function  to  be  maximized,  and  ths  g^  are  constraint  functions 
which  determine  the  set  of  f^^isible  decisions.  The  source  of  uncer¬ 
tainty  is  p,  which  isjmown  only  to  lie  in  a  given  set  B.  We 
also  consider  the-'dase  in  which  a  probability  distribution  over  B 
is  glnren-r^ 

/  Several  methods  for  circumventing  uncertainty  in  the  constraints 
are  briefly  reviewed,  and  several  decision  criteria  for  circumventing 
uncertainty  in  the  objective  function  are  discussed.  Particular 
attention  is  devoted  to  the  demonstration  of  certain  relationships 
between  these  criteria.  It  is  concluded  that  vector  maximum  reformu¬ 
lations  -ef— (i-)-  play  a  prominent  role  in  dealing  with  uncertainty  in 
such  decision  problems. 

A  vector  maximum  problem  is  of  the  form 


"Maximize"  f^(x) ,  ...  ,  f^(x) 

X 


subject  to  g^(x)  >0,  i  =  1,  . .  .  ,  m 


The  q[uotation  marks  signify  that  it  is  desired  to  find  all  efficient 


V 


decisions,  i.e.,  all  decision  vectors  satisfying  the  constraints 


such  that  it  is  impossible  to  achieve  an  increase  in  any  one  objective 
function  without  violating  the  constraints  or  decreasing  at  least 
one  of  the  other  objective  functions.  In  Chapter  II  we  discuss  two 
methods  for  transforming  a  vector  maximum  problem  into  an  equivalent 
parametric  programming  problem.  Existing  computational  methods  for 
the  latter  problems  are  briefly  surveyed. 

The  principal  contribution  of  this  work  is  presented  in  Chapter  III 
a  class  of  algorithms  for  solving  parametric  concave  programming 
problems  of  the  form 

Maximize  af^(x)  +  (l-a)f2(x) 

X 

subject  to  g^(x)  ^0,  i  =  1,  . . .  ,  m 

for  each  fixed  value  of  a  in  the  closed  interval  [0,1],  where 
f^  (i  =  1,2)  are  strictly  concave  functions,  (i  =  1, . . . ,m) 

are  concave  functions,  and  certain  additional  regularity  assumptions 
are  made.  Under  these  assumptions  it  is  shown  that  (2)  (with  r  =  2) 
and  (5)  are  equivalent  in  the  sense  that  x°  efficient  in  (2) 
if  and  only  if  x°  solves  (3)  for  some  value  of  0!  in  the  unit 
interval.  The  present  class  of  algorithms  is  not  "simplex- like" 
or  "gradient"  in  nature,  but  proceeds  by  maintaining  a  solution  of 
the  Kuhn- Tucker  Conditions  as  0£  varies  by  small  increments  (under 
our  assumptions  these  conditions  are  necessary  and  sufficient  for 
an  optimal  solution  of  (3)).  The  main  algorithm  given  herein  displays 
quadratic  convergence  at  each  increment  of  (X.  A  simple  modification 
for  handling  linear  equality  constraints  is  indicated. 


VI 


Problem  (3)  also  substunes  the  standard  (non-parametric)  concave 
programming  problem  when  a  feasible  solution  is  known.  Thus  the 
present  algorithms  provide  a  deformation  method  of  concave  programming. 
Since  many  of  the  results  of  this  chapter  hold  for  much  more  general 
parametric  problems  than  (3)>  moreover,  the  present  algorithms  are 
pertinent  to  sensitivity  analysis  applications. 

The  final  chapter  presents  a  numerical  example  which  illustrates 
the  solution  of  a  decision  problem  under  uncertainty  by  means  of  the 
techniques  discussed  in  the  preceding  chapters. 
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Hotatlon 

25  “  is  a  decision  vector  in  (n- dimensional  Euclidean 

space),  and  is  under  the  control  of  the  decision-maker 

P  =  ^ ^1^)  is  an  uncertain  vector  in  E^  representing  exogenous 

variables  and  model  parameters,  and  is  not  under  the  control 
of  the  decision-maker 

f(x,P)  is  a  real-valued .criterion  function  which  is  to  be  maximized; 

if  there  is  no  dependence  on  p,  we  write  f (x) ;  if  there 
are  several  criterion  functions,  we  write  f(x)  for 
(^2^25^  > ' '  •  > ^^(25)  ^ 

g(25^P)  -  (gj^(25>P)  ^  )  is  a  real  vector-valued  constraint 

function;  if  there  is  no  dependence  on  P,  we  write  g(x) 

{z  G  Z:  z  has  property  P}  denotes  the  set  of  all  elements  z 

in  the  set  Z  which  have  property  P;  when  Z  is  omitted, 
it  is  implicitly  understood  to  be  the  pertinent  universal 
set 

X  is  a  subset  of  e”  consisting  of  the  feasible  decisions;  often 
X  represents  {x:  g(x)  >  0} 

B  (in  Chapter  l)  is  a  subset  of  E^  which  is  known  to  contain  the 
"true  realization"  of  p 

x>  (>)  0  signifies  >  (>)  0  (i=  l,...,n) 

xi 


X  >  _0  signifies  x  ^  ®  2. 

p.  denotes  a  probability  distribution  over  B 

C  C(C1)  D  signifies  that  the  set  C  is  a  (proper)  subset  of  D 
N  (x°)  =  (x:  y,  (x.-x?)^l  <  rl  ,  an  open  neighborhood  of  x° 

1“  U=i  ^  ^  J  j 

of  radius  r 

F(a)  denotes  the  maximum  a-fractile  criterion  (see  problem  (4.5) 
of  Chapter  l) 

A(M)  denotes  the  aspiration  criterion  with  aspiration  level  M 
(see  problem  (4.6)  of  Chapter  I) 

[a,b)  =  (t  e  E^:  a  <  t  <  b) 

(PD:)  denotes  the  parametric  programming  problem  considered  in 

Chapter  III;  the  parameter  a  may  vary  in  this  notation 
(there  is  no  relation  between  this  usage  of  a  and  that 
of  Chapter  l) 

f(x;a)  =  af^(x)  +  (i-a)f2(x) 

V?  f(x)  =  \  ,  the  gradient  of  f(x) 

Vx  -  j 

S  denotes  a  subset  of  constraint  indices;  S  C  M,  where  M  is 
the  set  of  the  first  m  positive  integers 

u  =  (u  ,...,u  )  denotes  the  dual  variables  associated  with  the 
—  ^1'  m 

Kuhn- Tucker  conditions 
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(KT-l) j , (KT-4)  are,  collectively,  one  version  of  the  Kuhn- 
Tucker  conditions  associated  with  (Po;) 

(=S)q!  is  a  more  complete  notation  for  the  equations  (KT-l)  and 
(KT-2) ;  S  and  a  may  vary  in  this  notation 

(x*(q!),  u*(q!))  is  the  optimal  solution  and  dual  variables  of  (Pa) 
as  functions  of  a 

s  s 

(x  (q:)  ,  u  (a))  is  a  solution  of  (=S)q:  as  a  function  of  a 

denotes  the  matrix  of  second  partial  derivatives  (i.e.,  the 
hessian)  of  f(x) 

<  x^  >  -*  x°  means  that  the  (infinite)  sequence  ^ >'^ >  •  •  •  >^ >  •  •  • 

.  o 

converges  to  x 

C-D  denotes  the  points  in  the  set  C  which  are  not  in  the  set  D 

hx=  {i  e  M:  u|’(a)  >0),  the  set  of  active  constraints  at  Q!j  a 
may  vary  in  this  notation 

BCC  ^  {i  e  M:  g^(x*(a))  =  0),  the  set  of  binding  constraints  at 
x*(a) }  a  may  vary  in  this  notation 

0!'.  (j  =  1,...,N)  are  the  points  of  change  of  Pa.  or  of  Ba:  in  the 

unit  interval;  a'  is  a  generic  term  for  a  point  of  change 

a'+  is  an  arbitrary  point  strictly  between  two  points  of  change 

'la.'  ^  [Q!'-)I,  a'+f],  where  i  is  defined  Immediately  above  Theorem  ^.2, 
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CHAPTER  I 


On  the  Relevance  of  the  Vector  Maximum 
Problem  to  Decision-Making  Under  Uncertainty 

1.  Introduction 

This  chapter  addresses  a  problem  of  individual  decision-making 
under  uncertainty  of  the  form 

(l)  Maximize  f(x,p)  subject  to  g(£^P)  ^0  > 

X  ~ 

where  x  =  (x  ,  ...,x  )  is  the  decision  vector,  p  =  (p  )  is 

—  X  n  -^0 

a  vector  of  exogenous  variables  and  parameters  of  the  model,  f  is 
the  objective  (or  criterion  or  payoff)  function  to  be  maximized, 
and  g  =  (g^,  ...,g^)  is  a  vector-valued  constraint  function  which 
determines  the  set  of  feasible  decisions.  We  assume  that  the  functions 
f  and  g  are  known,  but  that  p  is  known  only  to  lie  in  a  given 

•T.  -L 

set  B  CZ  e'^,  where  E  is  b-dimensional  Euclidean  space.  Often 
we  shall  make  the  additional  assumption  that  p  may  be  regarded  as 
a  random  variable  with  a  known  probability  distribution  over  B. 

A  choice  of  x  must  be  made  before  P  is  found  out,  if,  indeed, 
it  ever  is  revealed  to  the  decision-maker.  Throughout  this  chapter, 
no  experimentation  is  permitted  in  order  to  reduce  uncertainty  about 

P- 

If  p  were  known  exactly,  then  (l)  would  be  a  well-defined 
problem  (providing  that  the  desired  maximum  exists,  of  course). 

But  we  have  assumed  that  p  is  uncertain,  and  so  (l)  is  not  well-defined. 
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There  are  two  distinct  aspects  of  the  difficulties  arising  from 
uncertainty  in  the  set  of  feasible  decisions  is  uncertain,  and 

the  objective  function  is  uncertain.  Maximization  cannot  be  performed 
until  the  constraints  and  objective  function  are  reformulated  so  as 
to  be  independent  of  p.  We  shall  discuss  a  variety  of  such  reformu¬ 
lations,  and  it  will  be  seen  that  quite  frequently  vector  maximum 
reformulations  play  a  prominent  role. 

The  Vector  Maximum  Problem 

A  vector  maximum  problem  arises  whenever  there  is  more  than  one 
objective  function  to  be  extremized.  Consider  the  problem 

(2)  "Maximize"  f(x)  , 

X  e  X 

where  f (x)  :=  (f^(x) , . . . , f^ (x) )  is  a  vector-valued  objective  function 
(each  component  of  f  represents  an  objective,  usually  non-additive 
with  the  others,  which  the  decision-maker  wants  to  maximize),  and 
X  c;  is  a  set  of  feasible  decisions.  In  the  fortunate  event  that 
each  component  of  the  objective  function  reaches  its  maximum  simul¬ 
taneously,  as  in  Figure  1,  then  (2)  is  said  to  have  a  perfect  solution. 

In  general,  however,  an  improvement  of  one  objective  beyond  a  certain 
point  can  only  be  obtained  at  the  expense  of  worsening  another. 

Suppose  that  for  a  feasible  decision  x°  there  exists  no  other  feasible 
decision  x^  such  thafi/  fCx^)  >  f (i°) •  Then  x°  is  termed  an 

IT  In  this  work  we  adopt  the  convention  that  x  =  £  signifies 

X.  >  0  (i  =  1, . . . ,n) ,  X  >  0  signifies  x^  >  0  (i  =  1, . .  .  ,n)  and 

X  >  0  for  at  least  one  i,  and  x  >  0  signifies  x  >0  (l  =  l,...,n). 

i  —  —  X 
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efficient  solutlon^'^  of  (2).  The  quotation  marks  in  (2)  signify 
that  it  is  desired  to  find  all  efficient  solutions.  When  they  are 
all  found,  the  vector  maximum  problem  (2)  has  been  solved. 

When  f  has  only  two  or  three  components,  we  envision  determining 
the  entire  set  of  efficient  solutions  and  presenting  the  corresponding 
outcomes  in  graphical  form  to  the  decision-maker,  who  would  then 
subjectively  determine  a  trade-off  between  conflicting  objectives 
and  thus  make  the  final  selection  of  a  decision.  Figures  1  and  2 
Illustrate  the  graph  of  attainable  outcomes  for  two  hypothetical  cases 
involving  two  objective  functions.  The  efficient  outcomes  are  denoted 
by  the  heavy  line  and  dot. 

fgCx) 


In  many  applied  decision  problems,  even  in  the  absence  of  uncer¬ 
tainty,  there  are  several  objective  functions  which  naturally  present 
themselves  to  the  decision-maker.  In  such  situations,  the  relevance 


W  The  notion  of  an  efficient  solution  is  essentially  the  same  as 
the  notion  of  "undominated"  or  "admissible"  decisions  in  decision 
theory,  and  the  notion  of  "Pareto  optimality"  in  game  theory  (see 
Luce  and  Raiffa,  1957^  P-  287  and  p.  II8)  . 
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of  the  vector  maximtun  problem  is  obvious,  and  need  not  be  emphasized 
further.  What  we  do  wish  to  emphasize  is  that  in  the  presence  of 
uncertainty  even  a  single-criterion- function  problem  such  as  (l), 
which  we  would  accept  as  the  "correct"  formulation  if  p  were  known 
exactly,  tends  to  explode  into  vector  maximum  reformulations  when 
one  attempts  to  turn  it  into  a  well-defined  problem. 

Plan  of  Discussion 

Because  uncertainty  in  the  constraints  is  fundamentally  different 
from  uncertainty  in  the  objective  function  of  (l),  we  split  our  dis¬ 
cussion  into  two  parts:  in  section  2  we  consider  ways  of  reformulating 
the  constraints  so  as  to  be  independent  of  p,  and  in  section  3  we 
consider  ways  of  reformulating  the  objective  function  so  as  to  be 
independent  of  jp  (this  is  usually  known  as  invoking  a  decision 
criterion).  These  two  steps  must  be  accomplished  in  order  to  convert 
(l)  into  a  well-defined  problem.  The  conversion  usually  can  be 
accomplished  in  several  ways,  reflecting  various  compromises  which 
may  be  made  to  uncertainty  in  P,  realism  in  the  final  model,  and 
computational  considerations. 

In  section  2,  three  reformulations  of  the  constraints  will  be 
discussed:  permanent  feasibility,  the  penalty  function  reformulation, 
and  probabilistic  constraints.  The  first  two  do  not  require  a  proba¬ 
bility  distribution  over  B,  while  the  last  does.  The  last  two 
reformulations  sometimes  lead  to  a  vector  maximum  problem. 

In  section  3  we  consider  several  decision  criteria,  and  some 
relations  between  them  are  noted.  We  suggest  that  a  given  decision 
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problem  should  be  attacked  by  several  decision  criteria  rather  than 
by  only  one.  The  result  is,  of  course,  a  vector  maximum  problem.  Two 
examples  are  presented  which  demonstrate  the  usefulness  of  considering 
two  criteria  simultaneously.  The  second  example  is  a  one-period 
inventory  model,  and  an  argument  is  given  for  deviating  from  the 
now  classical  solution. 

2.  Treating  Uncertainty  in  the  Feasibility  Constraints 

This  section  is  essentially  a  review  of  some  of  the  existing 
v/ays  of  circumventing  uncertainty  in  the  constraints,  and  is  included 
mainly  for  completeness.  Mixtures  and  variations  of  these  basic 
approaches  can  be  improvised  to  cover  most  particular  applications. 

The  Permanent  Feasibility  Reformulation 

To  be  absolutely  sure  of  choosing  a  feasible  decision,  choice 

must  be  limited  to  those  values  of  x  which  are  feasible  for  all 

6  e  B.  That  is,  restrict  attention  to  the  set^'^  n  [x:  g(x,P)  >  0} 

-  ■  P  e  B  “ 

(see  Madansky,  I962  and  I965) . 

An  obvious  difficulty  with  this  reformulation  is  that  when  B 
is  "large,"  the  permanently  feasible  set  is  apt  to  be  "small,"  and  even 
may  be  empty.  When  the  maximization  operation  is  performea  subsequently, 
there  may  be  little  opportunity  to  achieve  a  satisfactorily  high  value 
of  the  objective  function. 

— '  We  adopt  the  notation  of  using  braces  to  denote  sets  in  this  work. 

The  symbol  0  denotes  the  empty  set. 


5 


The  Penalty  Function  Reformulation 


The  above  reformulation  does  not  admit  the  possibility  of  ever 
choosing  a  decision  which  is  infeasible.  What  does  it  mean  to  say  that 
a  decision  x'  is  "infeasible"  when,  say,  p'  obtains?  Mathematically, 
we  have  g(x',P')  ^  which  means  that  either  (x',P')  is  physically 
impossible,  or  is  physically  possible  but  "undesirable"  (we  are  dis¬ 
tinguishing  between  those  constraints  which  are  dictated  by  the  physical 
limitations  of  the  system  and  those  which  are  imposed  at  the  model- 
maker's  discretion).  In  the  second  case,  it  may  be  possible  to  take 
additional  action  in  order  to  make  the  outcome  less  "undesirable, " 
or  at  least  to  pay  a  price  for  being  "infeasible."  Denote  this  "price" 
by  P(xSP')j  not  necessarily  measured  in  dollars.  Wote  that  p  is, 
in  general,  a  vector- valued  function,  reflecting  the  fact  that  vio¬ 
lations  of  different  constraints  may  imply  different  dimensions  of 
disutility.  For  example,  consider  an  investment  portfolio  optimization 
model  which  has  as  its  objective  the  maximization  of  portfolio  worth 
at  the  end  of  a  specified  horizon.  One  constraint  may  specify  a  desired 
level  of  diversification  (e.g. ,  a  maximum  of  of  the  portfolio  in 
defense  industries),  and  another  constraint  may  specify  a  lower  bound 
on  the  average  Standard  and  Poor's  quality  rating  of  the  securities. 
Violation  of  each  of  these  constraints  would  be  measured  in  different 
units  from  the  unit  of  measurement  of  the  objective  function. 

The  penalty  function  reformulation  of  (l)  results,  in  general, 
in  a  vector  maximum  problem  of  the  form 

(3)  "Maximize"  f (x,p) ,  -p(x^P)  • 

X 
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An  important  special  case  arises  when  p  has  hut  one  component, 
and  this  component  is  additive  with  f.  This  reformulation  then 
becomes^/ 

(3.1)  Maximize  [f(x,p)  -  p(xjP) ]  • 

X 

All  of  the  two-stage  "stochastic  programming"  problems  (see,  e.g., 
Dantzig,  1955,  Madansky,  I962,  and  Mangasarian  and  Rosen,  1964)  can 
be  thought  of  as  penalty  function  reformulations.  The  basic  idea  of 
these  problems  is  to  append  a  second  stage  to  the  original  problem 
to  "correct  for"  possible  infeasibility  of  the  original  decision;  p 
then  represents  the  minimum  cost  of  correcting  for  an  infeasible  x, 
as  affected  by  the  then  known  actual  value  of  p.  The  usual  example 
of  a  situation  in  which  the  two-period  formulation  may  be  appropriate 
is  the  case  of  a  manufacturer  who  is  committed  to  produce  to  satisfy 
an  unknown  demand  P  for  his  perishable  products.  If  all  of  the 
demand  is  not  satisfied,  then  he  purchases  the  difference  on  the  open 
market. 


Probabilistic  Constraints 

Assume  that  P  may  be  regarded  as  a  random  variable,  and  that 
its  probability  distribution  over  B  is  known. 


— '  Note  that  (l)  can  be  written  equivalently  in  this  form  if  p 
taken  to  be  arbitrarily  large  for  infeasible  combinations  of  x 
p,  and  equal  to  zero  for  feasible  combinations.  For  example. 


is 

and 


Maximize  [  Inf  [f(x)  u. g  (x,p)]]  . 

X  u  >  0  1 
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The  notion  of  permanent  feasibility  may  be  relaxed  if  one  requires 
merely  that  each  or  all  of  the  constraints  must  hold  with  at  least 
some  prescribed  probability.  For  example,  consider 

Maximize  f(x,p) 

X 

subject  to  Prob[g^(x,p)  >  0]  >  a^  ,  i  =  1,  . . .  ,  m  , 

where  0  <  a^  <  1  (i  =  1, . . . ,m) .  Charnes  and  Cooper  (1959^  1963) 
refer  to  this  as  "chance- constrained"  programming.  Note  that  when 
each  a^  is  nearly  one,  this  reformulation  approaches  the  permanent 
feasibility  reformulation. 

Another  probabilistic  constraint  reformulation  is 

Maximize  f(x,P) 

X 

subject  to  E[g(x,P) ]  >  0  , 
where  "E"  denotes  expectation. 

As  an  alternative  to  the  formulations  above,  one  may  incorporate 
some  or  al  1  of  the  probabilistic  constraints  in  the  objective  f-unction, 
e.g.  , 

"Maximize"  f(x,p)  ,  Erob[g^(x,p)  >  O] 

X 

subject  to  Prob[g^(x,p)  ^  0]  >  a^  ,  i  =  2,  3j  •  • •  j  m 

The  efficient  solutions  to  the  resulting  vector  maximum  problem  show 
clearly  the  available  trade-offs  between  the  original  objective  function 
and  assurance  that  various  of  the  constraints  will  be  met. 
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3-  Treating  Uncertainty  In  the  Objective  Function 


In  section  2  we  discussed  several  ways  of  reformulating  the 
constraints  so  as  to  be  independent  of  p.  Here  we  assume  that  this 
has  been  accomplished,  and  discuss  several  ways  of  reformulating  the 
objective  functions  so  as  to  be  independent  of  p.  For  the  sake  of 
simplicity  of  discussion,  we  shall  treat  the  case  of  but  a  single 
objective  function,  so  that  the  problem  to  be  considered  in  this  section 
can, be  rewritten  as 


(^)  Maximize  f(x,p)  . 

X  e  X 

As  before,  p  is  known  to  lie  in  a  given  set  B,  and  X  is  the 
set  of  feasible  decisions. 

Since  it  is  necessary  to  choose  a  decision  x  before  p  is 
revealed  (if  it  is  ever  revealed) ,  f (x,p)  must  be  replaced  by  a 
known  function  of  x  alone.  That  is,  (1+)  must  be  reformulated  as 

(^•0)  Maximize  f(x) 

x  e  X 

where  f  is  a  known  function  to  be  chosen.  The  choice  of  f  in  a 
given  situation  is  equivalent  to  what  is  customarily  known  as  the  choice 
of  a  decision  criterion.  If  a  decision  is  an  optimal  solution  of 
(h.O),  it  is  said  to  satisfy  the  decision  criterion  which  produces  f(x) 
from  f (x,p) . 

After  first  discussing  two  alternative  restatements  of  (4),  we 
shall  briefly  summarize  the  admissibility  criterion,  the  maxmin  payoff 
criterion,  the  estimate  criterion,  and  the  Principle  of  Insufficient 
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Reason.  The  difficulty  of  finding  a  single  ideal  decision  criterion 
is  well-known,  and  so  we  take  the  position  that  it  may  be  more  useful 
to  select  two  criteria,  each  with  distinct  merits  of  its  own,  and 
recast  (4)  as  a  vector  maximum  problem  (each  component  of  the  vector¬ 
valued  objective  function  is  derived  from  one  decision  criterion). 

An  example  is  presented  to  illustrate  the  possible  advantages  of  such 
a  procedure. 

We  then  shall  assume  that  a  probability  distribution  over  B  is 
given.  The  concept  of  stochastic  admissibility  is  introduced  as  a 
generalization  of  the  ordinary  concept  of  admissibility.  Next  we 
examine  three  decision  criteria  for  reducing  (4)  to  a  well-defined 
problem  with  heavy  emphasis  on  a  geometric  motivation  for  each  in 
order  to  gain  insight  and  understanding.  These  are  the  maximum 
expected  payoff  criterion,  the  maximum  a-fractile  criterion  (maximize 
the  a-fractile  of  the  distribution  of  f(x,_p)  under  the  probability 
distribution  of  p,  for  some  preselected  a) ,  and  an  aspiration 
criterion  (maximize  the  probability  of  achieving  at  least  some  pre¬ 
scribed  level  of  payoff).  Several  propositions  are  proved  which 
relate  these  criteria  to  each  other  and  to  the  previously  mentioned 
criteria  which  do  not  involve  probabilities.  Finally,  a  one-period 
inventory  example  is  presented  to  illustrate  the  ideas  of  this  section 
and  to  support  the  suggestion  that  several  criteria,  rather  than  a 
single  one,  should  be  selected  to  embody  the  conflicting  aims  of  the 
decision-maker.  The  resulting  vector  maximum  problem  should  then  be 
solved  in  place  of  (4) . 
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Alternative  Problem  Statements 


In  some  situations  the  objective  function  of  (4)  can  be  written 
as  f(x,p)  =  If  and  Fg  each  represent  a 

quantity  which  the  decision-maker  wants  to  maximize,  one  may  reformu¬ 
late  (4)  as  a  two- component  vector  maximum  problem 

"Maximize"  F^(x) ,  Fg(x,3)  , 

x  e  X 

so  as  to  quarantine  the  part  depending  on  p.  The  advantage  of  this 

formulation  is  that  the  decision-maker  gains  a  clearer  understanding 

of  how  his  objectives  are  influenced  by  uncertainty.  As  an  example, 

let  Fj^  represent  the  immediate  payoff  of  a  multistage  decision 

problem,  and  let  Fg  represent  the  present  worth  of  the  future  payoffs, 

where  p  represents  the  future  values  of  exogenous  variables. 

Another  restatement  of  (4)  is  obtained  by  using  regret  in  place 

of  payoff.  Assume  that  [  Max  f (x,p) ]  is  achieved  for  each  p  e  B. 

X  e  X 

The  regret  due  to  making  decision  x  and  then  observing  p  is  defined 
to  be 


r(x,p)  =  [  Max  f (x,p) ]  -  f(x,P)  . 

X  e  X 

Stating  problems  in  terms  of  regret  rather  than  payoff  has  the  advantage 
of  highlighting  the  consequences  of  uncertainty  in  p  dramatically. 

In  addition,  regret  may  have  more  tractable  mathematical  properties 
than  payoff  (assxmiing  that  the  indicated  maximization  operation  is 
not  overly  difficult),  due  to  non- negativity  and  sometimes  symmetry. 
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When  p  is  known  exactly,  maximizing  payoff  is,  of  course, 
exactly  equivalent  to  minimizing  regret.  When  p  is  uncertain, 
however,  and  various  criteria  are  applied  in  order  to  arrive  at  a 
decision,  it  is  well-known  that  different  decisions  often  result 
depending  on  whether  payoff  or  regret  is  used. 

In  this  work  the  discussion  will  be  carried  on  primarily  in 
terms  of  payoff,  but  with  the  obvious  modifications  each  criterion 
can  be  applied  to  regret  as  well. 

3. 1  Reformulations  not  Involving  Probabilities 

We  shall  briefly  review  a  few  classical  decision  criteria  which 
do  not  involve  probabilities.  An  example  is  given  to  illustrate  that 
it  can  be  more  useful  to  consider  several  criteria  simultaneously 
rather  than  to  search  for  a  single  ideal  criterion. 

Admissibility  Criterion 

Consider  (4).  A  decision  x'  is  said  to  be  admissible  (with 
respect  to  X  and  B)  if  x'  ^  X  and  if  there  exists  no  other 
decision  x"  e  X  such  that  f(x",p)  >  f(x',p)  for  all  P  e  B,  with 
strict  inequality  holding  for  some  value  of  p  e  B.  If  such  a  decision 
x"  did  exist,  it  would  be  said  to  dominate  x'  (one  may  also  define 
weak  dominance  by  dropping  the  proviso  that  strict  Inequality  must 
hold  for  some  value  of  p) .  The  admissibility  criterion  requires 
that  one  choose  an  admissible  decision.  In  other  words,  if  a(x) 
is  defined  to  be  equal  to  0  if  x  is  admissible  and  equal  to  -1 
if  X  is  Inadmissible,  (4)  is  reformulated  as: 

(4.1)  Maximize  a(x)  . 

X  e  X 
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The  difficulties  with  this  criterion  are  twofold:  the  set  of 
admissible  decisions  may  be  onerous  to  determine  computationally,  and 
this  set  may  be  quite  a  large  subset  of  X. 

Maxmin  Payoff  Criterion 

A  conservative  decision-maker  might  invoke  the  maxmin  payoff 
criterion,  which  yields 

(4.2)  Maximize  [  Inf  f(x,p) ]  . 

X  e  X  P  €  B 

The  corresponding  criterion  in  terms  of  regret  is  known,  of  course, 
as  the  minmax  regret  criterion. 

Estimate  Criterion 

The  estimate  criterion  requires  that  one  pick  a  value  for 
say  P,  and  then  act  as  though  P  were  the  true  value  of  P'~'^ 

That  is,  solve 

(4.3)  Maximize  f(x,p)  . 

X  €  X 

Since  p'  may  be  chosen  to  be  any  point  in  B,  we  see  that  we 
really  have  a  whole  family  of  criteria. 

1/  This  criterion  is  included  in  order  to  formalize  the  common  practice 
of  using  judgmental  or  engineering  approximations  to  costs  and  other 
parameters  of  decision  models.  The  notion  of  an  estimate  is  related 
to  the  idea  of  a  certainty  equivalent,  which  will  be  discussed  at  the 
end  of  subsection  3.2.  It  should  be  noted  that  this  criterion  may 
also  be  invoked  when  p  is  regarded  as  a  random  variable,  and  in 
fact,  the  expected  value  of  P  is  a  popular  estimate. 
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The  computational  advantages  of  this  approach  are  obvious.  It 
is  not  so  obvious  that  there  exists  a  "good"  estimate  in  or 

how  to  find  one. 

The  Principle  of  Insufficient  Reason 

Assume  that  B  consists  of  a  finite  number  (k)  of  elements, 
each  denoted  by  Then  the  Brinciple  of  Insufficient  Reason  asserts 

that  one  should  replace  (4)  by 

i=l 

Comparison  of  Criteria 

The  above  decision  criteria  are  representative  of  the  methods 
which  have  been  proposed  in  an  effort  to  circumvent  uncertainty  in 
the  objective  function  in  the  absence  of  probabilities.  The  diffi¬ 
culties  of  selecting  one  criterion  which  satisfies  all  of  a  compre¬ 
hensive  set  of  intuitively  appealing  desiderata  for  "rational" 
decision-making  are  well-known  (see,  e.g. ,  Luce  and  Raiffa,  1957, 
Chapter  13),  and  suggest  the  futility  of  seeking  an  ideal  criterion. 

One  possible  way  out  of  this  dilemma  is  to  consider  several  criteria 
at  once,  and  thus  to  reformulate  (4)  as  a  vector  maximum  problem. 

The  actual  choice  of  a  decision  would  be  made  on  an  ad  hoc  basis 
from  the  set  of  efficient  solutions. 

Table  1  defines  a  decision  problem  in  which  there  are  four 
possible  values  of  p,  and  five  possible  decisions.  The  entries 
give  the  values  of  f(x^,P^)  and  the  consequences  of  each  possible 
decision  in  terras  of  average  payoff  (on  which  the  Principle  of 


(4.4) 


Maximize 
X  e  X 
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Insufficient  Reason  is  based)  and  in  terms  of  minimum  payoff  (on 
which  the  maxmin  payoff  criterion  is  based) .  Figure  3  graphs  these 
consequences. 

All  decisions  are  admissible.  The  Principle  of  Insufficient 
Reason  would  lead  to  the  choice  of  decision  number  two,  while  the 
maxmin  payoff  criterion  leads  to  the  fifth  decision.  However,  it 
seems  reasonable  to  favor  the  fourth  decision  over  any  of  the  others 
because  it  comes  very  close  to  satisfying  both  of  the  above  criteria. 

We  submit  that  by  judicious  choice  of  two  criteria  the  resulting 
vector  maxim-urn  reformulation  of  (4)  can  be  expected  to  lead  to  a  more 
satisfactory  decision  than  a  single  criterion. 

3- 2  Reformulations  Involving  Probabilities 

With  the  additional  assumption  that  P  may  be  regarded  as  a 
random  variable,  one  may  choose  to  regard  (i^■)  as  a  continuous  game 
in  normal  form.  This  viewpoint,  and  the  consequent  game-theoretic 
solutions,  will  not  be  considered  here.  Instead  it  will  be  assumed 
that  p  has  a  known  probability  distribution  p  over  B  and  so 
(4)  may  be  regarded  as  a  game  against  a  neutral  "Nature. "  That  is, 
we  are  in  what  is  sometimes  known  as  a  situation  of  individual  decision¬ 
making  under  "risk. " 

The  principal  tenet  of  utility  theory  (an  excellent  summary  is 
given  in  Luce  and  Raiffa,  1957,  Chapter  2)  is  that  for  a  "rational" 
decision-maker  there  exists  a  utility  transformation  of  f,  which 
we  denote  by  u(f),  such  that  the  most  preferred  decision  is  an 


optimal  solution  of: 


Maximize  E[u(f(x,P))]  . 

X  e  X 

If  one  accepts  any  of  the  sets  of  axioms  of  rational  behavior  leading 
to  this  result,  then  the  maximum  expected  utility  criterion  is  justified 
provided  that  the  required  utility  transformation  is  at  hand. 

Unfortunately  it  may  be  very  tedious  actually  to  determine  u(f). 
For  this  reason  (and  also  because  of  certain  reservations  which  we 
have  with  regard  to  the  axioms  of  utility  theory) ,  we  shall  consider 
other  criteria  which  can  be  applied  directly  to  f(x,p)  without  the 
need  for  a  utility  transformation.  We  begin  by  introducing  a  natural 
analog  of  the  admissibility  criterion. 

Stochastic  Admissibility  Criterion 

For  fixed  x,  p  induces  a  probability  distribution  on  f  which 
may  be  plotted  in  cumulative  form  as  in  Figure  4  (each  curve  represents 
the  cumulative  distribution  function  of  f  corresponding  to  different 
values  of  x) .  Loosely  speaking,  one  wishes  to  perform  (4)  by  choosing 
an  X  which  determines  a  c.d.f.  that  is  uniformly  as  low  (or,  equiva¬ 
lently,  as  far  to  the  right)  as  possible.  In  Figure  4  it  is  clear  that 
the  c.d.f.  determined  by  x^  must  be  strictly  preferred  to  that  of 
x^,  while  x^  need  not  be  preferred  to  Xy  Observe  that  although 
the  probability  density  functions  determined  by  x^  and  Xg  overlap, 
the  c.d.f. 's  do  not. 

We  formalize  the  above  ideas  in  terms  of  the  concept  of  stochastic 
dominance.  A  decision  x°  is  said  to  stochastically  dominate  x' 


if  Prob[f(x°,P)  <  k]  <  Prob[f(x',p)  <  k]  for  all  real  k,  with 
strict  inequality  holding  for  at  least  one  value  of  k  (if  we  drop 
the  proviso  that  strict  inequality  must  hold  for  at  least  one  value 
of  k,  then  we  use  the  term  weak  stochastic  dominance) .  If  a  feasible 
decision  is  not  stochastically  dominated  by  any  other  feasible  decision, 
it  is  said  to  be  stochastically  admissible. The  stochastic  admissi¬ 
bility  criterion  requires  that  one  choose  a  stochastically  admissible 
decision  (this  criterion  can  be  written  in  a  form  similar  to  (4.l)). 

Remark:  Although  we  do  not  choose  to  do  so  in  this  paper,  it  is  possible 
to  strengthen  the  stochastic  admissibility  criterion  somewhat 
by  permitting  randomized  decisions  over  X.  One  would  say 
that  the  feasible  decision  x'  is  stochastically  inadmissible 
under  a  randomized  decision  strategy  if  there  exists  a  proba¬ 
bility  distribution  X  on  X  not  involving  x'  such  that 

Prob  .  [f(x,p)  <  k]  <  Prob  [  f  (x’ ,_g)  <  k]  for  all  k,  with 
p,X  —  —  p 

strict  inequality  holding  for  at  least  one  value  of  k.  For 
example,  in  Figure  h,  x^  is  stochastically  dominated  by 
the  randomized  strategy  which  chooses  x^  and  Xj^  each  with 
a  probability  of  one-half,  even  though  neither  x^  nor  x^ 
stochastically  dominate  X5  alone.  Randomized  decision 
rules  have  the  effect  of  taking  vertically  convex  combina¬ 
tions  of  the  c.d.f. 's.  It  is  clear  that  the  set  of 

Since  stochastic  admissibility  is  defined  in  terms  of  X  and  the 
particular  distribution  p,  to  be  precise  we  should  qualify  stochastic 
admissibility  as  being  "with  respect  to  X  and  p."  We  omit  this 
qualification  for  the  sake  of  brevity,  since  no  confusion  is  likely  to 
result  in  our  discussion. 
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stochastically  admissible  decisions  allowing  randomized 
strategies  is  contained  in  the  set  of  stochastically  admissible 
decisions  allowing  only  pure  strategies. 

We  now  explore  the  relationship  between  ordinary  and  stochastic 
admissibility. 

Rroposition  1: 

Let  4  vanish  outside  of  B.  If  x°  weakly  dominates  x'^ 
then  x°  weakly  stochastically  dominates  x'. 

Proof:  We  must  show  that  for  all  real  k,  Prob[f(x°,p)  <  k] 

<  Prob[f(x',P)  <  k].  By  the  definition,  of  (  non- stochastic)  weak 
dominance,  we  have  f(x',P)  <  f(x°,B)  for  all  B  e  B.  Thus  for  any 
fixed  value  of  k,  f(x°,B)  <  k  implies  f (x' ,B)  <  k,  and  so  for 
each  k  we  have 

(B  e  B:  f(x°,B)  <  k)  CtP  e  B;  f(x' ,B)  <  k}  . 

The  proposition  follows. 

Remark:  To  see  that  the  converse  of  this  proposition  need  not  hold, 

consider  the  following  example.  X  =  Cx°,x^},  B  =  CB^,B^}j 

f(x°,B^)  =  f(x^,B^)  =  1,  f(x°,B^)  =  f(x^,B^)  =  2, 

Prob[B  =  P  ]  =  -2  j  Prob[B  =  B^J  =  .8.  Then  x°  stochasti¬ 
cally  dominates  x  ,  but  x°  does  not  weakly  dominate  x^. 

With  additional  hypotheses,  one  may  strengthen  Proposition  1. 
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Rroposltion  2: 


Let  f(x,p)  be  continuous  on  B  for  each  x  ^ 

p.  be  positive^'^  everywhere  on  and  vanish  outside  of  B.  If 

x°  dominates  x',  then  x°  stochastically  dominates  x' . 

Proof:  Fran  Proposition  1  we  have  that  x°  weakly  stochasti¬ 

cally  dominates  x' .  It  remains  to  show  that  Rrob[f(x°,p)  <  k*]  < 
Prob[f(x',P)  <  k*]  for  seme  k*.  Since  x°  dominates  x’,  there 
exists  p*  e  B  such  that  f(x°,P*)  >  f (x’ , p*) .  Put  k*  = 
l/2(f(x°,P*)  +  f(x',P*))-  By  the  continuity  of  f  there  is  a  neigh¬ 
borhood  N*  of  p*  such  that  f(x  ^  k*  ^  f  (x  for  all 

p  e  N*  n  B,  and  so  by  the  positivity  of  p  on  B  we  have 
Prob[f(x°,p)  >  k*  >  f(x',p)]  >  0.  This  fact,  with  the  definition 
of  x°,  yields 

Prob[f(x',P)  <  k*]  =  Prob[f(x',B)  <  k*  <  f(x°,B) ]  + 

Prob[f(x',p)  <  k*  >  f(x°,P)] 

=  Prob[f(x',p)  <  k*  <  f(x°,P) ]  +  Prob[f(x°,p)  <  k*] 

>  Prob[f(x°,P)  <  k*]  . 

A  probability  distribution  is  said  to  be  positive  everywhere  on 
B  if  for  each  p°  e  B  then  for  every  (b- dimensional)  neighborhood 

of  P°  the  event  [N^O  B]  has  a  non-zero  probability.  A  neigh¬ 
borhood  of  P°  of  radius  p  is  defined  as  (P°-Pi)^  <  p}  ’ 

and  is  denoted  by  Np(P°)  when  a  complete  notation  is  desired. 
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Proposition  2  shows  that,  under  the  given  assiitnptions,  the  set 
of  stochastically  admissible  decisions  is  contained  in  the  set  of 
admissible  decisions,  as  one  would  expect  and  hope.  To  see  that  the 
set  of  stochastically  admissible  decisions  can  be  considerably  smaller 
than  the  set  of  admissible  decisions,  consider  the  example 

2 

Maximize  [10  -  (p  -  x)  ]  , 

X  e 

_  2 

w?aere  p  is  the  Normal  distribution  with  mean  p  and  variance  a  , 
and  B  =  R^.  Viewing  the  objective  function  as  a  family  of  functions 
of  p  Indexed  by  x,  this  family  is  seen  to  consist  of  concave 
parabolas  which  are  identical  except  for  the  axis  of  symmetry,  which 
occurs  at  p  =  x.  Clearly  every  x°  e  R^  is  admissible,  for 
f(x°,p  =  x°)  =  10  >  f(x,p  =  x°)  for  all  x  ^  x°.  It  is  also  clear 
that  x'  7^  p  is  stochastically  inadmissible,  for  Rrob[f(p,p)  <  k]  < 
Prob[f(x',p)  <  k]  for  all  k.  To  see  this  assertion,  observe  that 
{p:  f(x,p)  >  k}  is  an  interval  of  width  2(l0-k)^/^  centered  at 

P  =  X.  By  the  symmetry  and  unimodality  of  the  Normal  distribution, 
the  interval  centered  at  P  =  P  must  include  the  greatest  probability 
for  any  k,  and  hence  Rrob[f(p,p)  >  k]  >  Rrob[(x',p)  >  k]  when 
x'  p,  which  is  equivalent  to  the  assertion  that  x'  P  is 
stochastically  inadmissible.  Since  x  =  p  is  stochastically  admissible, 
we  see  that  only  x  =  p  is  stochastically  admissible,  whereas  all  x 
are  admissible. 

The  Maximum  g-Fractile  and  the  Aspiration  Criteria 

In  terms  of  Figure  4,  we  would  like  to  choose  a  decision  which 
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achieves  the  lower  envelope  of  c.d. f. 's  everywhere.  In  general  this 
is  impossible,  but  we  can  attempt  to  achieve  it  at  a  single  point  and 
hope  that  this  one  point  will  "pin  down"  a  c.d. f.  so  that  it  is  close 
to  the  lower  envelope.  The  point  may  be  specified  in  terms  of  its 
ordinate  or  abcissa  value,  whichever  seems  most  natural  in  a  given 
problem  context.  The  criteria  implied  by  this  idea  are,  respectively 
and  loosely: 

Criterion  F:  Choose  an  x  which  corresponds  to  a 
c.d. f.  which  approaches  the  lower  envelope  of  c.d. f. 's 
at  an  ordinate  value  of  a(0  <  CC  <  l) . 

Criterion  A:  Choose  an  x  which  corresponds  to  a 
c.d.f.  which  approaches  the  lower  envelope  at  an -abcissa 
value  of  M(-“  <  M  <  ») . 


It  is  evident  that  we  have  two  entire  families  of  criteria  here,  indexed 
by  a  and  M  respectively.  Criterion  F  with  a  =  0.1  would  lead 
to  the  choice  of  Xg  in  Figure  4,  and  Criterion  A  with  M  =  20  would 
lead  to  the  choice  of  X)^ . 

Criterion  F  is  equivalent  to  maixlmizing  the  a-fractile^^of  the 
distribution  of  f(x,P)  under  |j..  That  is,  it  maximizes  the  payoff 
level  below  which  there  is  at  most  an  a  probability  of  falling.—/ 

TU - 

—  We  define  the  a-fractile  of  a  (possibly  mixed)  cumulative  distri¬ 
bution  function  F(y)  =  Prob[Y  <  y]  as 

Sup{k:  F(k)  <  a]  . 

— /  See  Kataoka  (I965)  for  a  linear  programming  model  of  this  type. 

It  is  one  of  the  few  published  references  to  this  criterion. 
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It  corresponds,  for  fixed  0  <  a  <  1,  to: 

Maximize  k 
k,x 

(4.5)  subject  to  X  e  X 

Erob[f(x,p)  <  k]  <  a  . 

When  a,  is  small,  say  less  than  0.1,  this  criterion  shoi  -d  appeal 
to  conservative  decision-makers  because  it  tends  to  control  the  lower 
tail  of  the  distribution  of  payoffs.  When  GC  =  l/2,  (4.5)  maximizes 

the  median  of  the  distribution  of  payoffs,  of  course.  We  sometimes 
use  the  mnemonic  notation  F(a)  for  this  criterion. 

Criterion  A  is  equivalent  to  maximizing  the  probability  of  exceeding 
a  prescribed  "aspiration"  level  M  of  payoff  (see  Charnes  and  Cooper, 
1963,  for  an  application  to  linear  programming).  It  corresponds  to: 

(4.6)  Minimize  Prob[f(x,_g)  <  M]  . 

X  €  X 

We  sometimes  use  the  notation  A(m)  for  this  criterion. 

Remark:  It  is  to  be  noted  that  all  cumulative  distribution  functions 

in  this  subsection  are  written  as  Prob[f(x,p)  <  k]  rather 
than  as  Prob[f(x,p)  <  k]  (regard  x  as  being  fixed). 

This  convention  is  followed  in  order  to  avoid  some  minor 
difficulties  which  would  be  encountered  by  these  two  criteria 
if  the  opposite  convention  were  adopted  and  the  c.d. f. 's 
were  discontinuous. 
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We  introduced  these  two  criteria  together  because  of  their  intimate 
mathematical  relationship,  as  well  as  their  common  graphical  motivation. 
When  the  lower  envelope  is  attained  by  some  x  at  every  point,  and  is 
continuous  and  strictly  Increasing,  it  is  geometrically  clear  that 
the  F  and  A  criteria  are  complementary  in  the  sense  that  for  every  a 
there  is  an  M  which  leads  to  the  same  set  of  decisions,  and  conversely. 
Without  such  assumptions,  however,  the  complementarity  is  weakened, 
as  we  shall  see  in  the  following  two  easy  propositions. 

Proposition  3= 

(i)  Assume  that  criterion  F(q:°)  is  satisfied  by  at  least  one 

decision.  Then  the  set  of  decisions  which  satisfy  criterion 

F(a°)  contains  the  set  of  decisions  which  satisfy  criterion 

A(M°) ,  where  M°  is  the  maximum  a°-fractile. 

(ii)  Assume  that  criterion  A(M°)  is  satisfied  by  at  least  one 

decision.  Then  the  set  of  decisions  which  satisfy  criterion 

A(M°)  contains  the  set  of  decisions  which  satisfy  criterion 

F(a°),  where  a°  =  Min  Prob[f(x,P)  <  M°]. 

X  e  X 

Proof:  (i).  Let  x*  satisfy  F(a°),  and  let  M°  be  the  maximum 

a°-fractile.  If  x°  satisfies  A(M°) ,  then  Prob[f(x°,p)  <  M°]  < 
Prob[f(x*,p)  <  M°]  <  a°,  and  so  x°  must  also  satisfy  F(a°) .. 

(ii).  Let  X*  satisfy  A(M°)  ,  and  let  a°  ^  Min 

~  X  e  X 

Prob[f(x,jP)  <  M°]  =  Erob[f(x*,P)  <M°].  If  x°  satisfies  F(a°), 

then  there  exists  k°  >  M°  such  that  Prob[  f (x°,Jp)  <  k  ]  <  o;  j  since 
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k°  >  M°,  we  have  Ero'b[  f (x°, p)  <  M° ]  <  Prob[f(x°>p)  <  h°]  <  Ci° , 
from  which  it  follows  that  x°  must  satisfy  A(M°)  . 

Proposition  4: 

(i)  If  x°  satisfies  criterion  F(a°)  uniquely,  then  it 

satisfies  criterion  A(M°)  uniquely,  where  is  the 

maximum  a°-fractile. 

(ii)  If  x°  satisfies  criterion  A(M°)  uniquely,  then  it 
satisfies  criterion  F(a°)  uniquely,  where 

=  Proh[f(x°,p)  <  M°]. 

Proof:  (i).  Suppose  that  x°  'S.oes  not  satisfy  A(M°)  uniquely. 
Then  there  exists  x'  e  X,  x'  ^  x°,  such  that  Prob[f(x',p)  <M°]  < 
Prob[f(x°,p)  <  M°],  which  contradicts  the  fact  that  x°  satisfies 
F(a°)  uniquely. 

(ii).  Suppose  that  x°  does  not  satisfy  F(a°)  uniquely. 
Then  there  exist  k°  >  M°  and  x'  e  X,  x'  ^  x°,  such  that 
Prob[f(x',p)  <  k°]  <  =  Erob[f(x°,p)  <  M°].  Since  k°  >  M°,  we 

have  Rrob[f(x',P)  <  M°]  <  Prob[f(x',p)  <  k°],  and  so 
Rrob[f(x',P)  <  <  Prob[f(x°,p)  <  M°].  This  contradicts  the  fact 

that  x°  satisfies  A(M°)  uniquely. 

It  is  possible  for  criteria  F(a)  and  A(M)  to  lead  to  stochas¬ 
tically  inadmissible  decisions.  The  next  proposition  is  of  interest 
in  this  regard. 
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Proposition  5 : 


(l)  If  x°  satisfies  criterion  F(a°)  uniquely,  then  x° 

also  satisfies  the  stochastic  admissibility  criterion. 

(ii)  If  x°  satisfies  criterion  A(M°)  uniquely,  then  x° 
also  satisfies  the  stochastic  admissibility  criterion. 

Pj"oof :  (i).  In  view  of  part  (i)  of  Proposition  4,  to  prove  (i) 

it  is  sufficient  to  prove  (ii) . 

(ii).  Let  X  satisfy  A(M°)  uniquely,  so  that 
Prob[f(x°,p)  <  M°]  <  Prob[f(x,p)  <  for  all  x  e  X,  x  7^  x°. 

Suppose  that  x°  were  stochastically  inadmissible.  Then  there  would 
exist  X'  e  X,  x’  ^  >  such  that  Prob[f(x',p)  <  k]  < 

Prob[f(x  ,p)  <  k]  for  all  k.  Letting  k  =  M°,  one  would  obtain 
a  contradiction. 

Now  we  turn  to  the  relationship  between  the  maxmin  payoff  criterion 
and  the  maximum  a-fractile  criterion  with  a  =  0.  It  is  not  at  all 
surprising  that  under  mild  assumptions  these  criteria  are  in  fact 
equivalent,  i.e.,  the  same  decisions  satisfy  both. 

Proposition  6: 

Ass-ume  that  f(x,P)  is  upper  semicontinuousi^/  on  B  for  each 

X  e  X,  and  that  (i  is  positive  on  and  vanishes  outside  of  B. 

Then  the  maxmin  payoff  criterion  is  equivalent  to  the  maximum 

0-fractile  criterion. 

Let  x  be  fixed  in  X.  Then  ^C^^P)  is  upper  semicontinuous 
at  p  e  B  if  for  each  e:  >  O  3  8  >  0  (depending  on  p°  and  e)  such 
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Rroof :  We  shall  rewrite  (4.2)  and  (4.5)  in  such  a  way  as  to 
emphasize  their  similarity,  and  then  show  that  they  are  in  fact 
identical. 

The  maxmin  payoff  criterion  can  be  writteni^/ 

Maximize  [Sup{k:  f(x,P)  >  k,  V  p  e  B} ]  , 

X  e  X 

and  the  maximum  0-fractile  criterion  can  be  written 

Maximize  [Sup{k;  Rrob[f(x,p)  >  k]  =  1}]  . 

X  €  X 

Define  S^(x)  and  S2(x)  to  be  the  sets  appearing  in  the  first  and 
second  problems,  respectively,  for  fixed  x*  Clearly  Sj^(x)  CH Sg(x) , 

V  X  £  X,  for  p  vanishes  outside  of  B.  The  prdof  will  be  complete 
when  we  show  that  S2(x)  C;S^(x) ,  Y  ^  e  X. 

We  consider  a  fixed  Xj  and-  drop  the  x  arguments  from 
and  Sg.  We  may  assume  that  Sg  is  not  empty,  for  if  it  is 
empty  then  is  also  empty,  and  the  proof  is  complete.  Take 

k'  e  Sg.  Suppose  that  k'  S^.  Then  there  exists  P'  e  B  such 
that  f(x,P')  <  k'.  But  by  the  upper  semi continuity  of  f(x,p)  there 
exists  a  neighborhood  W  of  P'  such  that  f(x,p)  <  k’  for  all 

that  f(x,p)  <  f(x,jP)  +  e.  whenever  PeWg(P°)i  If  f  is  continuous, 
then  f  is  upper  semicontinuous.  Also,  recall  that  if  B  is  a  finite 
point  set  in  then  f (x,p)  is  automatically  continuous  on  B. 

11/ 

—  This  problem  follows  from  the  definition  of  'inf  as  the  greatest 
lower  bound. 
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.Jr  P  £  N'  n  B.  By  the  positivity  of  yx  on  this  contradicts  the 

^ .. 

& 

fact  that  k'  €  Sg. 

The  F  and  A  criteria  have  the  interesting  property  that  one  may 
perform  a  continuous  monotonic  transformation  on  f (x,p)  without 
altering  the  decisions  which  satisfy  these  criteria.  This  certainly 
is  not  true  of  the  next  criterion  we  shall  discuss,  the  expected 
value  criterion.  We  emphasize  this  point  in 

Proposition  'J: 

Let  g(t)  he  any  strictly  increasing  and  continuous  function 
defined  from  into  R^.  Then  (i)  the  set  of  decisions  which 

satisfy  criterion  F(a)  does  not  alter  if  f(x,p)  is  replaced 
by  g(f(x,p)),  and  (ii)  the  set  of  decisions  which  satisfy  criterion 

X  A(m)  does  not  alter  if  f(x,p)  is  replaced  by  g(f(x,p))  and 

M  is  replaced  by  g(M). 

Proof:  Observe  that  f(x,p)  <  k  if  and  only  if  g(f(x,p))  <  g(k), 
since  g  is  invertible  and  strictly  increasing.  Hence  {p:  f(x,p)  <  k] 
{P:  g(f(x,p))  <  g(k)},  and  so  Prob[f(x,p)  <  k]  =  Prob[ g( f (x, p) )  < 

g(k) ].  This  yields  (ii).  To  see  (i) ,  write 

Sup{k:  Prob[ f (x,p)  <  k]  <  a} 

=  Sup{k:  Prob[g(f (x,p) )  <  g(k) ]  <  a} 

=  Sup{g”^(g(k)) :  Prob[g(f (x,p) )  <  g(k) ]  <  a] 

=  g"^(Sup{g(k) :  Prob[g(f (x,p) )  <  g(k) ]  <  a}) 

=  g”^(Sup{t:  Prob[g(f(x,p) )  <t]  <  o:})  . 


'.i 
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Finally 


z 


Max  [Sup{k:  Erob[f(x,p)  <  k]  <«}] 

X  G  X 

=  g~^(  Max  [SupCk:  Prob[g(f  (x,p) )  <  k]  <  a}])  . 

X  e  X 

Maximum  Expected  Payoff  Criterion 

The  F  and  A  criteria  are  designed  to  achieve  the  lower  envelope 
of  the  familv  of  c.d.f. 's  {Erob[f(x,3)  <  k]]  .  „  at  a  single  point, 

in  an  attempt  to  "pin  down"  a  c.d.f.  to  lie  "close" to  the  lower 
envelope.  Another  approach  would  be  to  use  the  area  above  the  lower 
envelope  and  below  a  candidate  c.d.f.  as  a  measvire  of  "closeness." 

Criterion  E:  Choose  an  x  e  X  which  determines  the 
c.d.f.  with  the  least  area  below  it  and  above  the  lower 
envelope . 

We  shall  show  now  that  this  geometrically  motivated  criterion 
is  equivalent  to  the  maximum  expected  payoff  criterion: 

(4.7)  Maximize  E[f(x,p)]  . 

X  G  X 

Proposition  8: 

Criterion  E  is  equivalent  to  the  maximum  expected  payoff  criterion. 

Proof:  The  proof  is  a  simple  consequence  of  the  geometric  inter¬ 

pretation  of  the  mean  of  a  random  variable  in  terms  of  the  graph  of 
its  cumulative  distribution  function.  In  Figure  5,  the  mean  of  the 
random  variable  Y  is  area  1  minus  area  2  (see  Parzen,  i960,  p.  21l) . 
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Denote  by  A(x)^  the  area  corresponding  to  area  1  of  Figure  5 
for  the  c.d. f.  Rrob[f(x,P)  <  k],  and  by  A(x)"  the  area  corres¬ 
ponding  to  area  2.  Similarly,  denote  by  a"*"  and  A~  the  areas  above 
and  below  the  lower  envelope  of  all  such  c.d. f. 's.  The  the  maximum 
expected  payoff  criterion  may  be  written 

Maximize  [A(x)^  -  A(x)  ]  , 

X  e  X 

and  Criterion  E  may  be  written 


Minimize  [(A(x)"  -  A~)  +  (A^  -  A(x)''’)  ]  . 
X  e  X 


Clearly  these  two  problems  lead  to  the  same  decisions. 


There  is  an  obvious  and  fortunate  relationship  between  the  maximum 
expected  payoff  criterion  and  the  estimate  criterion  which  sometimes 
permits  one  to  choose  an  estimate  in  a  simple  way  so  that  the  estimate 
criterion  is  satisfied  by  the  same  set  of  decisions  as  the  expected 
payoff  criterion. 
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Proposition  9- 


Assume  that  f(x,p)  can  be  written  as 

f(x,p)  =  F^(x)  +  FgCp)  +  2  H.(x)^.  . 

Then  the  estimate  criterion  with  p  =  E[p]  is  satisfied  by  the 
same  set  of  decisions  as  the  maximum  expected  payoff  criterion. 

ft;oof :  The  maximum  expected  payoff  criterion  gives 


Maximize  E  F  (x)  +  F_(p)  +  T  H.(x)p. 
xeX 


or 


Maximize  Tf^Cx)  +  E[F2(P)  ]  +  H.  (x)  E[p^; 
X  €  X  1-  1  ^ 


The  estimate  criterion  with  g  =  E[p]  gives 


Maximize  F^(x)  +  F2(E[pj)  +  ^  H. (x)  E[p^] 
X  e  X  i  ^ 


Since  the  Fg  terms  of  each  problem  do  not  contain  x,  they  may 
be  deleted,  and  hence  the  two  criteria  lead  to  identical  sets  of 
decisions. 


When  the  above  proposition  applies,  we  say  that  the  estimate 
-  E[p]  is  a  certainty  equivalent  with  respect  to  the  maximum 
expected  payoff  criterion.  Other  results  in  the  same  vein  are  given 
by  Reiter  (1957).  Simon  (1956),  and  Theil  (1964). 

It  IS  easy  to  see  from  Proposition  8  that  any  decision  which 
satisfies  the  maximum  expected  payoff  criterion  must  be  stochastically 
admissible. 
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It  is  also  worth  noting  that  the  expected  value  criterion  leads 
to  the  same  decisions  when  applied  to  payoff  as  when  applied  to  regret. 
In  general  this  is  not  true  for  criteria  A(m)  and  F(a) . 

3 • 3  An  Example 

We  present  a  simple  inventory  model  as  an  illustration  of  the 
ideas  of  this  section  and  as  a  vehicle  for  further  discussion.  Consider 
a  firm  stocking  and  selling  a  single  commodity  for  a  single  period  of 
time.  We  use  the  notation 

X  =  number  of  units  to  be  ordered  in  advance  of  the 
demand 

P  =  unknown  demand  level  during  the  period 
c  =  cost  per  unit 
r  =  revenue  per  unit  (r  >  c) 

V  =  salvage  value  per  unit  left  at  end  of  period  (v  <  c) 

f(x,P)  =  total  profit 

( 

X  =  [0,oo) 

B  =  ^MAX  chosen  sufficiently  large 

to  account  for  the  largest  likely  demand 

The  payoff  and  regret  are  given  by 

P  <  X 
P  >  X 

P  <  X 
P  >  X  . 


f(x,p)  = 


(r-c)p  -  (x-P)(c-v)  if 


(r-c): 


if 


r(x,p)  = 


(c-v) (x-P) 

I 

(r-c) (P-x) 


if 

if 
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First  we  examine  the  criteria  not  involving  prohahilities  over 


the  set  of  possible  demand  levels.  All  choices  for  x  e  X  are  readily 

seen  to  he  admissible.  The  maxmin  payoff  criterion  leads  to  the  decision 

to  order  zero  units,  since  Min  f(x,p)  =  -(c-v)x.  When  this  criterion 

P  e  B 

is  applied  to  regret,  however,  it  (minmax  regret)  leads  to  the  decision 

to  order  [(r-c)/(r-v)  This  is  the  same  decision  that  the 

Principle  of  Insufficient  Reason  would  give  if  we  interpret  it  as 

putting  a  uniform  distribution  over  estimate  criterion 

A 

leads  to  a  trivial  maximization  problem  once  an  estimate  p  is  chosen. 


and  indicates  that  we  should  order  exactly  x  =  p. 

Next  we  examine  the  criteria  involving  probabilities  over  the  set 
of  possible  demand  levels.  In  order  to  plot  the  cumulative  distri¬ 
butions  of  payoff  for  various  candidate  x's,  we  need  to  know  the 
set  of  P's  for  which  the  payoff  is  less  than  k. 


P  >  0, 


f(x,p)  <  k} 


0 

if 

k  <  -(c-v)x 

r  k  +  (c-v)x\ 

if 

-(c-v)x  <  k  <  (r-c) 

'  r-v  1 

»  * 

[0,”) 

if 

k  >  (r-c)x  . 

Using  the  fact  that  x 


is  non-negative,  we  have  for  k  >  0 


Prob[ f (x,p)  <  k] 


i 


k+(c-v)x 
(r-v) 


if  x  < 


dn  if  X  > 


'r-c) 
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For  k  <  0, 


if  X  < 


-k 


(c-v) 


Prob[f(x,p)  <  k]  = 


-k 


1-/  d|jifx>  ' 

-  (c-v) 


'k+(c-v)x 
(r-v) 

The  lower  envelope  may  be  obtained  by  solving,  for  all  real  k, 
the  problem 

Minimize  Prob[f(x, p)  <  k]  . 

X  >  0 

This  problem  has  a  very  simple  solution  for  this  example.  For  k  <  0, 
the  minimum  is  zero  and  is  achieved  for  0  <  x  <  iki/(c-v).  For  k  >  0, 


the  minimum  is  1  -  /  dp  and  is  achieved  for  x  =  k/(r-c). 

Jk/  (r-c) 

Assume  for  computational  simplicity  that  the  demand  is  exponen¬ 
tially  distributed  with  mean  10,  that  (c-v)  =  l/2,  and  that 
(r-c)  =  3/2.  Then  for  k  >  0,  the  lower  envelope  has  height 
[1  -  exp[-.06^  k]],  and  is  achieved  at  x  =  2k/3.— ^  Figure  6 
illustrates  the  lower  envelope  and  a  few  sample  c.d. f. 's.  Observe 
that  each  c.d. f.  jumps  to  the  value  1  as  soon  as  it  attains  the  lower 
envelope,  and  that  every  x  >  0  is  stochastically  admissible. 

We  are  now  in  a  position  to  read  off  the  "optimal"  decisions 
corresponding  to  criteria  A(M)  and  F(a)  for  any  choice  of  M 
or  a.  A(M°)  leads  to  the  unique  choice  of  x  =  M°/(r-c),  and 
F(a°)  leads  to  the  unique  choice  x  =  -10  In(l-a^).  In  this 


127 

—  Note  that  the  lower  envelope  is  the  c.d.f.  of  an  exponential 
distribution  with  mean  I5. 
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particular  example,  these  criteria  do  not  fulfill  their  promise  of 
"pinning  down"  a  c.d. f.  to  lie  close  to  the  lower  envelope,  because 
each  c.d.  f.  is  discontinuous  at  the  point  at  which  it  achieves  the 
lower  envelope. 

The  maximum  expected  payoff  criterion  may  be  applied  by  setting 

the  derivative  of  E[f(x,p)]  equal  to  zero  and  solving  for  x. 

This  computation  leads  to  the  well-known  (Dvoretzsky,  Kiefer,  and 

Wolfowltz,  1952)  result  that  one  should  choose  the  value  of  x 

corresponding  to  the  (r-c) /(r-v) -th  fractile  of  p.  That  is, 
fx* 

X*  should  satisfy  /  dp  =  (r-c)/(r-v).  For  the  data  assumed  above, 

Jo 

X*  =  13.8.  It  is  interesting  to  observe  that  if  p  were  uniform 
on  ■then  the  minmax  regret  criterion  would  lead  to  exactly 

the  same  action  as  would  the  maximum  expected  payoff  criterion. 

Next  we  carry  out  a  parallel  analysis  in  terms  of  regret  rather 
than  payoff  .  It  will  be  seen  that  A(M)  and  F(a)  are  more 
appealing  when  applied  to  the  regret  distributions.  An  argument  will 
be  presented  for  choosing  a  value  of  x  other  than  that  which  mini¬ 
mizes  expected  regret  (which,  of  course,  is  equivalent  to  maximizing 
the  expected  payoff,  the  now  classical  solution  to  this  problem) . 


We  have,  for  k  >  0, 

(P:  P  >  0,  r(x,p)  <  k)  = 


if 


Since  we  are  dealing  in  terms  of  regret,  rather  than  payoff,  we  seek 
the  upper  envelope  rather  than  the  lower  envelope.  It  is  obtained  by 
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maximizing,  for  all 


k  >  0,  Prob[r(x,P)  <  k]  ; 


Maximize 
X  >  0 


4 


Since  the  exponential  distribution  is  monotone  decreasing,  the  maximum 
is  easily  seen  to  be  achieved  at  x  =  k/(c-v).  The  height  of  the 
upper  envelope  is  therefore  equal  to  Prob[p  <  k/(c-v)  +  k/(r-c)]. 

For  the  data  given  previously,  this  quantity  is  computed  to  be 
[l-exp(-0. 26^  k)],  and  the  upper  envelope  is  achieved  for  x  =  2k. 

Figure  7  is  the  counterpart  of  Figure  6.  Note  that  the  c.d.f. 's 
are  continuous,  so  that  A(m)  and  F(a)  are  more  effective  in  their 
endeavor  to  "pin  up"  a  c.d.f.  to  lie  near  the  upper  envelope. 

For  a  given  value  of  x,  it  is  a  straightforward  matter  to  calcu¬ 
late  the  expected  regret  and  the  a-fractile.  This  has  been  done 
for  a  =  .95  and  some  representative  values  of  x  in  Figure  8.  The 

striking  feature  of  this  graph  is  that  large  relative  changes  in  .95- 
fractlle  are  available  with  only  small  relative  changes  in  expected 
regret,  with  the  result  that  it  becomes  attractive  to  deviate  from 
the  ordinary  minimum  expected  regret  solution  to  the  problem.  For 
example,  consider  x  =  15.8  (which  yields  the  minimum  expected  regret) 
in  comparison  with  x  =  20.  The  former  has  an  expected  regret  of 
6.9  and  a  .95-fractile  of  2^1-.  1,  whereas  the  latter  has  an  expected 
regret  of  7.7  and  a  .95-fractile  of  1^4-. 8.  That  is,  by  choosing  x  =  20 
instead  of  15.8,  one  may  achieve  a  58.5?^  decrease  in  .95-fractile  at 
the  expense  of  only  11.6^  increase  in  expected  regret;  for  x  =  I8 
instead  of  15-8,  the  percentages  become  26.1)^  and  5.9^- 
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Prob  [r(x,p)  <  k] 


Figure  7 


This  example  shows  a  special  instance  of  what  is  likely  to  he 
a  quite  general  situation:  in  the  neighborhood  of  the  decision  indicated 
by  the  maximum  expected  payoff  criterion,  it  is  possible  to  substan¬ 
tially  improve  the  a-fractile  or  aspiration  levels  of  payoff  or  regret 
without  lowering  the  expected  payoff  very  much.  Such  possibilities 
ought  to  be  investigated  and  exploited  when  found  to  be  relevant  to 
the  decision-maker's  objectives. 

3.4  Vector  Maximum  Reformulations 

The  "ideal"  decision  criterion  is  analagous  to  the  much- sought 
philosophers'  stone  of  medieval  times,  and  seems  about  as  likely 
to  exist.  We  suggest  that  one  might  profitably  consider,  in  a  given 
application,  two  or  even  three  plausible  criteria  (not  necessarily 
the  ones  discussed  herein)  and  reformulate  (4)  as  a  vector  maximum 
problem.  The  solution  of  this  vector  maximum  problem  would  reveal 
clearly  the  tradeoffs  involved  between  the  criteria,  and  a  decision 
may  be  chosen  in  an  ad  hoc  manner  from  the  efficient  candidates.  For 
example,  if  a  situation  such  as  Figure  9  occurs,  one  would  probably 
choose  an  efficient  solution  nearer  to  point  B  than  to  point  A,  for 
a  large  gain  in  criterion  2  can  be  achieved  at  the  expense  of  a  rela¬ 
tively  small  loss  in  criterion  1. 


Figure  9  (to  be  maximized) 

4l 


One  combination  of  criteria  which  seems  particularly  plausible 
when  a  probability  distribution  over  B  is  available  is  the  a-fractile 
criterion  with  the  expected  value  criterion.  With  a.  small,  the 
first  criterion  tends  to  control  the  lower  tail  of  the  distribution 
of  payoffs,  while  the  second  tends  to  control  the  mean.  Such  a  com¬ 
bination  might  be  used  to  program  a  mutual  investment  fund,  for  example, 
for  the  possibility  of  ruin  or  large  losses  seems  to  loom  as  a  sepai’ate 
dimension  of  utility  from  the  average  growth  rate.  Markowitz  (1956) 
had  precisely  this  viewpoint  in  mind  for  his  well-known  portfolio 
problem,  except  that  he  used  variance  in  place  of  the  a-fractlle. 

Hodges  and  Lehmann  (1952)  proposed  essentially  this  combination 
of  criteria,  except  that  they  took  Q!  ecjual  to  zero.  Letting  0! 
rise  above  zero  seems  to  avoid  some  of  the  excessive  conservatism  in 
their  formulation,  while  keeping  the  aim  of  protection  against  large 


losses. 


CHAPTER  II 


Reducing  a  Vector  Maximum  Problem  to  a 
Parametric  Programming  Problem 

In  this  chapter  it  is  assiamed  that  uncertainty  has  been  removed 
from  a  decision  problem  by  means  of  devices  such  as  those  discussed 
in  the  first  chapter,  and  that  it  is  desired  to  solve  the  vector 
maximiim  problem, 

(l)  "Maximize"  f(x)  > 

X  e  X 

where  f(x)  =  (f >  •  •  •  >  W  ^ i  n-vector,  and  X  is 

a  given  set  of  feasible  decisions.  Recall  that  "solving"  (l)  means 
finding  all  efficient  decisions,  where  a  feasible  decision  x°  is 
called  efficient  if  there  exists  no  feasible  decision  x'  such  that 
f(x')  >  f (x°) We  shall  discuss  two  ways  of  reducing  (l)  to  a 
parameterized  family  of  ordinary  (one  criterion  function)  mathematical 
programming  problems,  or  "parametric"  programming  problems.  Existing 
computational  methods  for  these  problems  will  be  indicated. 

This  chapter  is  intended  to  serve  as  a  bridge  between  the  study 
of  decision  problems  under  uncertainty,  which  was  the  topic  of  the 
first  chapter,  and  the  study  of  a  class  of  algorithms  for  parametric 
programming,  which  is  the  topic  of  the  third  chapter. 

— ^  Recall  that  by  this  notation  we  mean  f^(x')  >  f^(x°)  (i  =  1, ...,n) 
with  f^(x')  >  for  some  i  (see  Footnote  1,  Chapter  l) . 
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Reducing  (l)  to  a  Problem  Parametric  In  the  Constraints 

From  the  definition  of  an  efficient  decision  for  (l)^  it  is 

easy  to  see  that  a  feasible  decision  x°  is  efficient  if  and  only  if 
o 

X  IS  an  optimal  solution  to  each  of  the  r  problems 

Maximize  f . (x) 

(2i)  X  e  X 

subject  to  f^(x)  >  f^(x°)  ,  j  =  1,  ,  r  but  j  i  , 

1  =  l,...,r.  It  follows  immediately  that  the  following  assertion 
holds . 

Proposition  1: 

^  ^  fixed.  If  x°  is  efficient  in  (l),  then 

there  exists  an  (r-l)-vector  5  such  that  x°  is  an  optimal 
solution  of  (3i^),  where  (3i)  is  given  by 

Maximize  f . (x) 

(3i)  ^  “ 

subject  to  fj(  x)  >  8^  ,  j  =  1,  ...  ,  r  but  j  ^  i  . 

This  proposition  suggests  a  method  for  finding  all  efficient 
decisions.  Taking  r  =  2  and  i^  =  l,  for  example^,  we  find  the 

set  of  all  efficient  decisions  among  the  totality  of  optimal  solutions 
to 

Maximize  f  (x) 

(3)  X  e  X  ^  - 

subject  to  fg(x)  >  6 

as  8  varies  over  (-oo,-k«).  often  f^(x)  is  bounded  from  above 
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4 


on 


X,  and  so  the  interval  of  parametric  variation  does  not  extend 


to  +“.  Likewise  when  fgCx)  is  hounded  from  below  on  X,  or  when 
the  maximum  of  f^(x)  ^  achieved  for  some  value  of  x^ 

interval  of  parametric  variation  need  not  extend  to 

This  method  yields  not  only  all  efficient  decisions,  but  possibly 
some  inefficient  ones  as  well,  since  it  may  be  possible  to  Increase 
f2(x)  without  decreasing  fj^(x)  below  its  maximum  value  for  a  parti¬ 
cular  value  of  8.  A  similar  remark  holds  a  fortiori  for  r  >  2. 

Culling  out  the  inefficient  decisions  when  r  =  2  is  easily  done, 
in  principle,  by  viewing  the  graph  of  (f^(x) , f ^(x) )  for  all  candi¬ 
date  decisions  generated  by  the  method.  For  r  >  2,  graphical  analysis 
rapidly  becomes  Impractical,  and  one  must  rely  on  sufficient  conditions 
such  as  those  given  in 


Proposition  2: 

Let  1  <  i^  <  r  and  the  (r-l) -vector  5^  be  fixed,  and  let 
x°  be  an  optimal  solution  to  with  8  =  8^.  If  any  of 

the  following  three  conditions  are  satisfied,  then  x°  is 
efficient  in  (1). 

(i)  x°  is  also  an  optimal  solution  of  the  r-l  problems 

(5i),  i  4-  with  8  =  f  ■  (x°) ,  j  =  l,...,r. 

o  J  J 

(ii)  x°  is  the  unique  optimal  solution  to  (31^)  with 

8=6. 

—  — o 


x°  is  the  unique  optimal  solution  to  (31^)  with 

8.  =  f.(x°),  J  i  . 

0  0—  o 


I 


I 

i 


1 

I 

I 


* 


I 

♦ 


(iii) 


Efoof :  If  (i)  is  satisfied;,  x°  is  efficient  in  (l)  ty  the 

opening  remark  of  this  section. 

Assume  that  (ii)  is  satisfied,  and  suppose  that  x°  is  not 
efficient.  Then  there  exists  x'  X  such  that  f(x')  >  f (x°) ^ 
which  implies  that  x'  is  feasible  and  optimal  in  (31^^)  with 
—  ~  —o>  thus  contradicting  the  unique  optimality  of  x°.  Hence 
x°  is  efficient. 

Since  x°  also  is  an  optimal  solution  of  (51  )  with  5  =  f  (x°)  . 

o  J  J  - 

the  argument  apropos  (il)  applies. 

Under  additional  hypotheses.  Propositions  1  and  2  can  be  combined 
to  give 

Proposition  5: 

Let  ^  ^  ^  ^  fixed.  Assume  that  ^io  strictly  concave, 

fj(j  ^  1  )  is  concave,  and  X  is  convex.—^  Then  x°  is 

JO  — 

efficient  in  (l)  if  and  only  if  x  solves  (31j^)  for  some  (r-l)- 
vector  5. 

i^oof :  Necessity  was  proven  in  Proposition  1.  To  prove  suffi¬ 
ciency,  apply  A. 2  of  Appendix  A  and  part  (ii)  of  Proposition  2. 

2.  Reducing  (l)  to  a  Problem  Parametric  in  the  Objective  Function 
We  shall  give  some  conditions  under  which  (l)  can  be  reduced  to 

27  ^  . 

—  See  Appendix  A  for  definitions  of  convex  sets  and  concave  functions, 
and  some  properties  thereof  which  will  be  used  freely  in  the  sequel. 


a  family  of  problems  of  the  form 


r 

(1+)  Maximize  ^  v. 

X  e  X  i=l 

where  v  >  0  is  a  vector- valued  parameter. 

Proposition 

(i)  If  V  >  0  and  x°  optimal  solution  to  (^),  then 

x°  is  efficient  in  (l). 

(ii)  If  V  >  0  and  x°  is  the  unique  optimal  solution  of 

(4) ,  then  x°  is  efficient  in  (l) . 

Proof:  Suppose  that  (i)  is  false.  Then  there  exists  x’  e  X 
such  that  f(x')  ^  >  since  v  >  0,  this  implies  that 

^v^f^(x')  >2‘v^f^(x°),  thus  contradicting  the  optimality  of  x° 
in  (t-).  This  proves  (i). 

Suppose  that  (ii)  is  false.  Then  there  exists  x'  €  X,  x'  5^ 
such  that  f(x')  ^  '  since  X  —  2’  "this  implies  that 

^v^f^(x')  ^  ^  ^  thus  contradicting  the  unique  optimality 

of  x°  in  (4).  This  proves  (ii). 

3/ 

Proposition  3:— 

Let  X  be  convex,  let  fj^(£)  he  concave,  i  =  l,...,r,  and 

let  x°  be  efficient  in  (l).  Then  there  exists  an  r-vector 

v°  >  0  such  that  x°  is  an  optimal  solution  of  (4)  with  v  =  v°. 

—  idle  earliest  statement  and  proof  of  a  theorem  of  this  type  seems  to 
be  due  to  Kuhn  and  Tucker  (l95l) .  An  elegant  proof  of  this  proposition 
has  been  given  by  Karlin  (1959,  p.  217) •  For  the  sake  of  completeness 
we  record  a  slightly  different  version  of  that  proof  here. 
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Proof :  Pu.t  P  =  {p  e  E^;  p  >  f(x°)  )■  Cleaxly  P  is  convex. 

Put  Z  =  {z  e  E^:  z  <  f(x)  for  some  X  e  X].  Z  is  convex^  for 
let  e  Z  and  let  0  <  \  <  1.  Py  the  definition  of  Z  there 

exist  x',x"  e  X  such  that  _z'  <  f(x')  and  _z"  <  f(x").  Hence 

(\z'  +  (l-X)z")  <\f(x')  +  (l-\)f(x")  <f(X.x'  +  (l-\)x")  , 

where  the  last  Inequality  follows  from  the  concavity  of  f (x) .  Since 
(\x'  +  (l-X)x")  e  X  by  the  convexity  of  X,  (X_z'  +  (l-x)z")  e  Z. 
This  shows  that  Z  is  convex. 

Because  x°  is  efficient,  Z  H  P  is  the  single  point  f (x°) , 
so  that  Z  and  P  have  no  interior  points  in  common.  Hence  we  may 
apply  the  well-known  Theorem  of  the  Separating  Hyperplane  (see  A. 7, 
Appendix  A)  to  assert  the  existence  of  an  r-vector  v°  ^  0  and  a 
scalar  c  such  that 


Z  5  ^  Pi  ^  Vz  e  Z,  p  e  P  . 

The  right-hand  inequality  and  the  definition  of  P  imply  that 
>  0,  for  otherwise  the  srun  p^  would  be  unbounded  from 

below.  By  the  definition  of  Z,  the  left-hand  inequality  yields 
^v°  f^(x)  <  c,  Vx  e  X.  Taking  p  =  f(x°),  we  have  ^  v°  f^(x)  < 
^^1  ^  ■which  is  equivalent  to  the  assertion  that 

x°  is  an  optimal  solution  of  (4)  with  v  =  v°. 

When  the  hypotheses  of  Proposition  5  hold,  one  is  sure  to  find 
all  efficient  decisions  for  (l)  among  the  totality  of  optimal  decisions 
for  (4)  as  V  ranges  over  all  non-negative  values.  Notice  that 


without  loss  of  generality  one  may  take  =  1  in  (4),  since  for 

fixed  V  >  0  the  objective  function  of  that  problem  can  be  scaled 
by  a  factor  of  l/  ^  without  affecting  the  set  of  optimal  solutions. 
Hence  v  is  really  only  an  (r-l) -dimensional  parameter.  When  r  =  2, 
for  example,  (k)  reduces  to  the  parametric  problem 

(i<-.l)  Maximize  vf  (x)  +  (l-v)  f^(x)  for  each  0  <  v  <  1  . 

X  e  X  ^  “  “ 

By  strengthening  the  hypotheses  of  Proposition  5,  the  last  two 
propositions  can  be  combined  to  give 

Proposition  6: 

Let  X  be  convex,  and  let  f^(x)  (i  =  l,...,r)  be  strictly 
concave.  Then  x°  is  efficient  in  (l)  if  and  only  if  x° 
solves  (4)  for  some  v  >  0. 

I^oof :  Necessity  was  proven  in  Proposition  5.  To  prove  sufficiency, 
S'PPly  A.  2,  A. 4,  and  part  (ii)  of  Proposition  4. 

3-  Computational  Methods  for  Parametric  Problems 

A  very  common  approach  for  a  decision-maker  to  take,  when  faced 
with  solving  a  multi- criterion  problem  such  as  (l),  is  to  reformulate 
(l)  in  the  form  of  (ji)  or  (4)  (or  possibly  a  combination  of  the  two) 
with  8  or  V  fixed  at  some  value  of  particular  interest.  Problem 
(31)  corresponds  to  selecting  and  retaining  the  most  important  criterion 
function  and  putting  the  rest  in  as  constraints  so  that  the  remaining 
criteria  each  meet  at  least  some  minimally  acceptable  level.— ^ 
in - 

-  For  an  early  and  important  example  of  this,  see  Neyman  and  Pearson 
(1933),  who  employed  this  device  as  a  cornerstone  of  their  theory  of 
statistical  hypothesis  testing. 


Problem  (4)  corresponds  to  maximizing  a  weighted  combination  of  crlter 
which  is  designed  to  reflect  the  relative  importance  of  each.  Such 
an  approach  offers  computational  simplicity  in  comparison  with  a 
complete  solution  of  (l) ,  since  Just  one  ordinary  maximization  problem 
has  to  be  solved.  After  (3i)  or  (4)  has  been, solved  for  the  selected 
5°  or  v°,  the  value  of  6  or  v  may  be  varied  in  a  neighborhood 
of  or  v°  in  order  to  ascertain  how  the  corresponding  optimal 

decisions  and  payoff  function  vary.  This  is  a  type  of  "sensitivity 
analysis."  The  above  propositions  relate  this  type  of  sensitivity 
analysis  to  the  partial  solution  of  (l)  in  the  vector  maximum  sense. 

Whether  for  purposes  of  sensitivity  analysis  or  of  solving  (l), 
solution  methods  are  required  for  the  parametric  problems  associated 
with  (3i)  and  (4).  Since  analytic  methods  can  be  expected  to  have 
very  limited  applicability— if  experience  with  non- parametric  mathe¬ 
matical  programming  is  any  guide— numerical  methods  must  be  employed. 

In  this  regard,  we  are  obliged  to  limit  our  consideration  to  problems 
for  which  X  is  convex  and  f^(x)  (i  =  l,...,r)  is  concave,  for 
most  known  programming  algorithms-^  require  at  least  convexity  of 
the  feasible  region  and  concavity  of  the  objective  function.  We  shall 
further  limit  our  consideration  to  the  important  case  r  =  2,  because 
the  vastness  of  the  parameter  space  increases  so  rapidly  with  r  as 
to  preclude  the  reasonable  hope  of  solving  parametric  problems  even 
to  reasonable  approximation  when  r  is  much  larger  than  2  or  3. 

— ^  For  surveys  of  (nonlinear)  programming  algorithms,  see,  e.g., 

Dorn  (1963),  Hadley  (1964),  Saaty  and  Bram  (1964,  Chapter  3),  Wolfe 
(1962),  and  Zoutendijk  (i960). 


We  now  indicate  some  existing  computational  methods^  and  point  out 
the  need  for  the  developments  of  the  next  chapter. 

If  X  is  a  convex  polyhedron  (i.e.^  the  feasible  region  is 
determined  by  a  set  of  linear  equalities  or  inequalities),  then  several 
efficient  parametric  programming  algorithms  are  available  for  certain 
special  classes  of  criterion  functions;  when  f^  and  fg  are  both 
linear  functions,  parametric  versions  of  (3)  and  (4.l)  can  be  solved 
by  parametric  linear  programming  (Gass,  1955) ;  when  f^  is  linear 
and  fg  is  a  quadratic  polynomial,—'^  the  algorithms  of  Houthakker 
(i960),  Markowitz  (1956),  and  Wolfe  (1959)  are  available;—'^  when  f^ 
and  fg  are  both  quadratic  polynomials,  an  algorithm  of  Zahl  (196^) 
essentially  solves  (U.l),  although  it  seems  possible  to  improve  upon 
the  efficiency  of  his  procedure  by  utilizing  the  developments  of  the 
next  chapter.  Little  if  anything  appears  to  have  been  done  to  devise 
efficient  algorithms  for  parametric  problems  involving  more  general 
classes  of  criterion  functions  or  feasible  regions  other  than  convex 
polyhedra.  The  class  of  algorithms  developed  in  Chapter  III  is 
intended  as  a  contribution  in  this  direction.  At  the  present  state 
of  the  art  of  parametric  programming,  however,  one  must  fall  back 
upon  more  rudimentary  methods. 

In  principle,  if  an  algorithm  is  available  which  will  solve 
(3i)  or  (1+)  for  any  particular  value  of  the  parameter,  then  by 

That  is,  fgCx)  where  t  denotes  transpose  and  Q 

is  a  negative  semidefinite  matrix. 

— See  also  Boot  (1965a,  1965b) . 


rtti^ 


employing  a  suitably  fine  grid  of  parameter  values  one  can  obtain  a 
discrete  approximation  to  the  optimal  solutions  of  the  parametric 
problem.  This  is  a  very  straightforward  approach,  and  for  many 
problems  it  may  be  fairly  practical,  since  the  optimal  solution  for 
one  parameter  value  can  be  expected  to  provide  a  nearly  optimal 
solution  at  the  next  parameter  value  on  the  grid.  Because  most 
programming  algorithms  may  be  viewed  as  gradient  methods,  this 
approach  should  provide  roughly  first  order  convergence  between 
optimal  solutions  at  adjacent  pairs  of  grid  points. 

In  the  next  chapter  we  offer  an  alternative  to  the  last  approach 
under  quite  general  assumptions  on  the  criterion  functions  and  the 
feasible  region.  We  shall  develop  a  class  of  algorithms  for  solving 
(4.1),  a  main  member  of  which  exhibits  second  order— convergence 
between  adjacent  pairs  of  grid  points. 


■gy - - 

A  sequence  <  x  >  which  converges  to  2E°  exhibits  first  (second) 
order  convergence  if  the  norm  of  the  error  at  the  n-th  step  is 
asymptotically  proportional  to  the  (square  of  the)  norm  of  the  error 
at  the  n-lst  step  (see  Appendix  C,  section  l) . 


CHAPTER  III 


A  Class  of  Algorithms  for  Parametric  Concave  Pt’ogrammlng 
1.  Introduction  and  Preliminaries 

In  this  chapter  we  present  a  class  of  algorithms  for  solving  parametric 
concave  programming  problems  of  the  form 

Maximize  af^(x)  +  (l-a)f2(x) 

(Itt)  - 

subject  to  g(x)  ^  0 

for  each  d  e  [0,1],  where  x  is  an  n-vector,  f^(x)  (i  =  1,2)  is 
strictly  concave,—'^  and  each  component  function  of  g(x)  =  ,  •  ■  • 

is  concave.  Certain  additional  regularity  requirements  are  detailed  in 
subsection  2.1. 

Since  our  topic  is  parametric  programming,  rather  than  ordinary 
(non-parametric)  mathematical  programming,  we  shall  further  assume 
that  an  optimal  solution  of  (pa)  is  available  for  some  value  of  a 
in  the  unit  interval.  This  assumption  is  in  fact  not  restrictive, 
for  it  is  shown  in  subsection  1.1  that  a  parametric  programming  algorithm 
for  (Bd;)  which  requires  an  optimal  solution  for  some  value  of  a 
in  order  to  "get  started"  can  itself  be  used  to  generate  such  an 
optimal  solution. 


The  algorithms  to  be  given  still  apply  if  (in  the  following,  e  >  0 
is  arbitrarily  small):  (a)  f^  is  strictly  concave  and  f^  is  (non- 
strlctly)  concave  and  [0,l]  is  replaced  by  [e,l],  or  (b)  f^  is 
concave  and  fg  is  strictly  concave  and  [0,1]  is  replaced  by 
[0,1-e],  or  (c)  af^  +  (l-a)fg  is  strictly  concave  for  each  fixed 
a  €  (0,l)  and  [0,1]  is  replaced  by  [€,1-6]. 
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The  remainder  of  this  section  motivates  (Ri)  and  the  present 
class  of  algorithms:  in  subsection  1,1  it  is  noted  that  (la)  subsumes 
the  vector  maximum  problem  for  two  criterion  functions  and  also  the 
standard  (non- parametric)  concave  programming  problem,  and  in  sub¬ 
section  1.2  the  K-uhn- Tucker  Theorem  for  nonlinear  programming  is 
presented  in  slightly  unconventional  form  so  as  to  display  clearly 
the  foundation  upon  which  the  present  class  of  algorithms  is  built. 

Section  2  is  devoted  to  presenting  and  proving  a  Basic  Conceptual 
Algorithm  for  solving  (la)  for  each  value  of  a  in  the  unit  Interval. 
Three  graphical  examples  are  given  in  Appendix  B.  The  development 
of  this  conceptual  algorithm  into  a  Basic  Computational  Algorithm,  via  the 
use  of  Newton's  method  for  solving  the  relevant  systems  of  equations, 
is  the  subject  of  section  J.  Sane  necessary  computational  devices  are 
recorded  in  Appendix  C.  Section  4  hosts  a  modification  (more  accurately, 
a  completion)  of  the  algorithms  aimed  at  improving  their  efficiency. 

Two  extensions  are  indicated  in  section  5:  the  adaptation  of  the 
present  algorithms  to  handle  linear  equality  constraints,  and  the 
possibility  of  solving  more  general  kinds  of  parametric  problems  than 
(Ta). 

1. 1  Motivation  of  (la) 

One  motive  for  studying  (la)  was  given  in  Chapter  II.  From 
Proposition  6  of  that  chapter,  which  applies  because  of  the  above 
assumptions,  solving  (la)  for  all  0  <  a  <  1  is  exactly  equivalent 
to  solving  the  vector  maximum  problem 

(l)  "Maximize"  f^(x),  fg(x)  subject  to  g(x)  >0  . 

X  ~ 
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That  is,  every  efficient  decision  for  (l)  is  an  optimal  solution  of 
(lot)  for  some  ^  ^  conversely. 

Another  reason  for  studying  (rx)  is  that  it  subsumes  the  standard 
problem  of  concave  programming.  Suppose  that  it  is  desired  to  solve 

(2)  Maximize  F(x)  subject  to  g(x)  >  0  , 

X  “ 

where  F(x)  is  strictly  concave  and  the  constraint  functions  are  all 
concave.  If  x  is  any  feasible  decision  whatsoever  of  (2),  put 
(Rx)  equal  to 

n 

Maximize  (XF(x)  +  (l-cx)  (-1)  V  (x.  -  x?)^ 

X  1  ^  ^ 

(3a) 

subject  to  g(x)  >  0  . 

Then  x°  clearly  is  the  optimal  solution  of  (5o),  and  (ja)  satisfies 
the  ass\mptions  required  of  (Rx:)  in  the  opening  paragraph.  Applying 
an  algorithm  for  parametric  concave  programming  to  (3q;)  beginning 
with  a  =  0  and  increasing  a  until  a  =  1,  one  obtains  the  optimal 
solution  to  (3^),  which  is  identical  to  (2).  Hence  a  parametric 
algorithm  for  (Rx)  provides  a  "deformation"  method  of  concave  pro- 
gramming. 

Problem  (3a)  is  capable  of  an  interesting  interpretation,  which 
we  shall  now  sketch  briefly.  Consider  an  enterprise  currently  "operating" 
at  the  (feasible)  point  x°,  with  a  single  criterion  function  F(x) 
and  a  feasible  operating  region  {x:  g(x)  >0).  Due  to  conservatism, 
or  a  desire  to  avoid  disrupting  the  operations  of  the  enterprise 
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radically,  or  to  a  desire  to  hedge  against  the  risk  of  a  faulty  decision 
model,  assume  that  the  managers  of  the  enterprise  prefer  to  adjust  the 
operating  point  gradually  from  x°  toward  x*,  where  x*  is  optimal 
in  (2).  If  the  managers  have  a  quadratic  loss  function  (x^-x?)^ 
associated  with  deviations  from  x°,  the  optimal  solution  to  {3cc) 
as  a  varies  from  0  to  1  gives  an  optimum  path  from  x°  to  x*. 

Since  (Kc)  for  fixed  Oi  is  of  the  form  (2),  the  device  repre¬ 
sented  by  (30:)  can  be  used  to  find  a  starting  optimal  solution  to 
(PQ:)  if  one  exists  (providing  that  a  feasible  decision  is  known),  so 
that  the  assumption  stated  in  the  introduction  is  not  restrictive, 
as  asserted. 

Of  course,  in  place  of  (3Q!)  one  could  use 


(4a) 


Maximize  aF(x)  +  (l-a)H(x) 

X 

subject  to  g(x)  >  0  , 


where  H(x)  is  a  strictly  concave  function  with  a  known  maximum 
over  the  feasible  region. 


1. 2  Theoretical  Foundation 

The  standard  problem  of  concave  programming  can  be  written  in  the 

form  of  (Pa:  )  with  a  fixed.  For  simplicity  of  notation,  we  write 
'  o  o 

f(xja)  for  af^(x)  +  (l-a)f2(x).  Hence  written  as 

(Pa  )  Maximize  f(xja  )  subject  to  g(x)  >  0  . 

°  X  ~ 


56 


Fundamental  theoretical  results  concerning  this  problem  have  been  given 
by  Kuhn  and  Tucker  (l95l)-  A  version  of  their  Theorem  J  is  recorded 
here  without  proof. 

Theorem  (Kuhn- Tucker) : 

Consider  with  a  fixed.  Let  f(‘X:a  )  and  g.(x) 

(i  =  l,...,m)  he  differentiable  on  the  feasible  region  (x:  g(x)  >  O], 

let  f(x;a^)  be  concave  on  the  feasible  region,  and  let  g^(x) 

(i  =  1, . . .  ,m)  be  concave  on  Assume  that  the  constraint  functions 

satisfy  the  Kuhn- Tucker  Constraint  Qualification  (see  the  remark 

following  the  statement  of  the  theorem) . 

Then  x°  is  an  optimal  solution  of  only  if  there 

exist  real  m  numbers  X°  such  that  (x°,\°)  satisfies  the  following 

(Kuhn- Tucker)  conditions^/  at  0!  =  a  : 

o 

m 

(5)  \/^f(x;a)  ^iV^SiU)  =  2 

(6)  2^(2^)  ^  ^  i  =  1,  . . .  ,  m 

(?)  g^(x)  j  0  implies  |>|  0  ,  i  =  1,  .  .  .  ,  m  . 

Remark :  For  a  statement  and  discussion  of  the  Kuhn- Tucker  Constraint 
Qualification,  see  Kuhn  and  Tucker  (1951,  p.  485)  or  Arrow, 
Hurwicz,  and  Uzawa  (1961).  It  has  been  shown,  for  example, 
that  if  all  the  constraints  are  linear  then  this  qualification 

-2  The  symbol  denotes  the  gradient  of  a  function  of  several  variables, 

e-S., 
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<1. 


is  satisfied;  and  that  the  existence  of  an  interior  point 
of  the  feasible  region  is  also  sufficient  for  the  qualifi¬ 
cation  to  be  satisfied.  The  sufficient  condition  which  will 
be  of  direct  use  in  the  sequel  is:  if  optimal 

solution  of  then  the  matrix  whose  rows  are 

V7  S  (x*(a  )),  i  such  that  g ■  (x*(a  )  )  =  0,  is  of  maximal 
vx  i  —  o  1—0 

rank  (see  Arrow,  Hurwicz,  and  Uzawa,  I961) . 

Direct  analytical  or  numerical  attempts  to  satisfy  these  conditions 

have  proven  quite  difficult,  in  general. 

We  shall  find  the  following  equivalent  version  of  the  Kuhn- Tucker 
Theorem  more  suitable  for  our  purposes. 

Theorem  ( Kuhn- Tucker ,  an  alternate  version) : 

Assume  that  the  hypotheses  of  the  Kuhn- Tucker  Theorem  are  satisfied. 
Then  x°  is  an  optimal  solution  of  only  if  there 

exist  m  real  numbers  u°  and  a  subset  S°  of  constraint  indices 
such  that  (x°,u°,S°)  satisfies  the  following  conditions  at  a= 

/ 

(KT-1)  ^  "  - 

(=s)a 

(KT-2)  g^(x)  =0,  V  i  e  S 

u^  =  0,  V  i  S 
(KT-3)  >0,  V  i  ^  S 

i'KT-h)  u^  >  0,  Vies. 
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Equations  (KT-l)  and  (KT-2)  appear  so  often  together  in  the  sequel 


that  we  introduce  the  special  symbol  (=S)a  to  denote  them  (in  this 
notation,  S  and  a.  may  vary) .  We  also  denote  the  set  of  the  first 
positive  integers  by  M. 

The  equivalence  of  the  two  versions  of  this  theorem  follows  from 
the  easily  verified 

Proposition  1; 

(i)  If  satisfies  (5)  through  (7)  at  CL^,  then 

satisfies  (KT-l)  through  (KT-4)  at  for 

any  S°  satisfying 

(8)  (i  e  M:  >  0}CS°oCi  e  M:  g^(x°)  =  0}  . 

(ii)  If  (x°,u°,S°)  satisfies  (KT-l)  through  (KT-4)  at 

then  satisfies  (5)  through  (7)  at  a^. 

The  numbers  or  u.  will  be  referred  to  as  dual  variables. 

1  1  - 

In  view  of  Proposition  1  it  is  useless  to  distinguish  between  \  and 
uj  henceforth  we  shall  use  the  s5rmbol  u  to  refer  to  the  dual  vari¬ 
ables  of  either  version  of  the  Kuhn- Tucker  Theorem. 

The  concept  of  a  valid  set  plays  a  central  role  in  this  work. 

A  subset  S°  of  constraint  indices  is  said  to  be  valid  at  OL  if 

-  o 

and  only  if  there  exists  such  that  (x°,u°,S°)  satisfies 

(KT-l)  through  (KT-4)  at  a^. 


m 
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Proposition  2:  ^ 

A  subset  S°  of  constraint  indices  is  valid  at  Q!^  if  and  only 
if  S°  satisfies  (8)  for  some  (x°A°)  -which  satisfies  (5) 
through  (7)  at  Ct^. 

Proof:  Assume  that  S°  is  valid  at  CC^.  Then  there  exists 

such  that  (x°,u°,S°)  satisfies  (KT-l)  through  (KT-4)  at  which 

implies  by  part  (ii)  of  Proposition  1  that  satisfies  (5) 

through  (7)  at  a^.  By  (KT-2)  and  (KT-4)  ,  {i  e  M:  \°  >  0]C.S°  holds. 

By  (KT-2),  S°C1 {i  e  M:  g^(x°)  =  O]  holds.  This  proves  necessity. 

Ass-ume  now  that  S°  satisfies  (8)  for  some  }]^)  satisfying  (5) 

through  (7)  at  Ct^.  By  part  (i)  of  Proposition  1,  (x°,\°,S°)  satisfies 

(KT-l)  through  (KT-i<-)  at  a^,  which  shows  that  S°  is  valid  at 

The  alternate  version  encourages  the  important  observation  that  the 

5/ 

Kuhn- Tucker  Conditions  may  be  viewed  as  the  Lagrange  multiplier  equations— 


— /  The  method  of  Lagrange  multipliers  (see,  e.g. ,  Apostol,  1957^  P*  153) 
gives  a  set  of  first  order  necessary  conditions  for  a  point  x°  to  be 
an  optimal  solution  of  the  problem 

Maximize  f(x)  subject  to  g^(x)  =  0,  i  =  1,  . . .  ,  m  . 

X 

Assume  that  f(x)  and  g^(x)  (i  =  l,...,m)  are  continuously  differen¬ 
tiable  on  some  open  region  containing  the  feasible  region,  and  that  the 
matrix  whose  rows  are  ,  i  =  1,  .  . .  ,m,  is  of  maximal  rank  (note 

that  this  last  assumption  implies  that  m  <  n,  where  n  is  the  dimension 
of  x) .  If  x°  is  an  optimal  solution  of  the  above  problem,  then 
ther'e  exist  m  real  numbers  such  that  (x°,\°)  satisfies  the 

(Lagrange  multiplier)  equations: 

V/(x)  \V^Si(x)  =  0  and 


g^(x)  =0,  1=1; 


m  . 
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applied  to  a  subset  S  of  the  constraints,  augmented  by  the  inequations 
(KT-3)  and  (KT-4).  Attention  thereby  focuses  on  discovering  the 
identity  of  a  valid  set,  for  if  one  knew  a  valid  set  S*  then  in 
principle  one  could  solve  (=S*)a!^  for  all  solutions  (x',u'), 
among  which  at  least  one  would  satisfy  (KT-3)  and  (KT-4)  and  hence 
solve  Indeed,  at  least  one  algorithm  (see  Theil  and  Van  de 

Panne,  I96O,  and  also  Boot,  I961)  has  already  been  proposed  which  is 
essentially  aimed  at  determining  a  valid  set.  However,  this  approach 
is  probably  not  very  efficient  computationally,  for  although  it  reduces 
the  concave  programming  problem  to  one  of  solving  sets  of  simultaneous 
equations,  there  is  a  vast  nixmber  of  candidate  sets  of  equations  to 
be  tried  when  a  valid  set  is  not  known.  It  seems  to  be  difficult,  even 
for  problems  of  modest  size,  to  know  how  to  order  the  trials  so  as  to 
keep  the  number  of  erroneous  trials  at  a  reasonable  level.  This 
combinatorial  difficulty  is  further  aggravated  by  the  numerical  burden 
of  actually  solving  (=S)a^.  Thus  we  may  expect  the  customary  gradient 
methods  to  be  more  efficient  than  methods  based  on  the  "valid  set 
approach. " 

Let  us  turn  now  to  parametric  programming.  It  is  perhaps  surprising, 
in  view  of  the  immediately  preceding  comments,  that  here  methods  based 
on  the  "valid  set  approach"  seem  to  have  the  advantage  over  gradient 
methods.  In  fact  the  parametric  programming  algorithms  (cf.  section  5 
of  Chapter  II)  of  Markowitz  (1956),  Houthakker  (1960),  and  Zahl  (1964) 
each  may  be  viewed  as  maintaining  the  identity  of  a  valid  set  as  a 
paxameter  is  varied. 
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Under  appropriate  assimiptions  the  optimal  solution  x*(Q!)  of 
(Rz)  and  the  associated  dual  variables  u*(a)  are  unique  and  con¬ 
tinuous.  This  fact,  coupled  with  the  observation  that  there  is  only 
a  finite  number  of  subsets  of  constraints,  suggests  that  if  S'  is 
valid  at  a^,  say,  then  S'  is  likely  to  be  valid  in  some  interval 
including  CX^.  If  this  is  the  case,  then  one  may  derive  x*(a)  and 
u*(a)  in  that  interval  by  solving  (=S')q:  parametrically,  and  (KT-3) 
and  (KT-^)  are  automatically  satisfied.  If  this  is  not  the  case, 
then  even  though  (=S')a  may  have  a  solution  near  a^,  either  (KT-3) 
or  (KT-4)  will  be  violated,  and  it  is  necessary  to  find  a  new  valid 
set  before  being  able  to  proceed.  Because  of  continuity,  moreover, 
a  set  which  is  valid  near  will  usually  differ  by  only  a  few 

constraint  indices  from  S'.  This  approach  leads  to  a  decomposition  of 
(Id;)  on  [0,1]  into  a  chain  of  parametric  subproblems.  Each  sub¬ 
problem  involves  the  parametric  solution  of  the  Lagrange  multiplier 
equations  associated  with  the  constraints  specified  by  a  constant  valid 
set  on  a  subinterval  of  [0,1].  By  continuity  the  optimal  terminal 
solution  to  one  subproblem  is  the  optimal  initial  solution  to  the 
next  subproblem  of  the  chain,  and  the  valid  sets  of  adjacent  sub¬ 
problems  are  both  valid  at  the  transition  point  between  them. 

Thus  parametric  programming  can  be  reduced  essentially  to  the 
problem  in  numerical  analysis  of  solving  parameterized  (nonlinear, 
in  general)  simultaneous  equations.  This  approach  to  parametric 
programming  turns  out  to  be  a  useful  one  computationally,  since  the 
systems  of  equations  involved  will  be  shown  to  be  well-behaved.  By 
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applying  Newton's  method  (see  Appendix  C),  second  order  convergence 
can  be  achieved  as  the  parameter  increases  by  discrete  increments, 
whereas  gradient  methods  display  roughly  first  order  convergence. 

2.  A  Basic  Conceptual  Algorithm 

In  this  section  we  state  and  prove  a  Basic  Conceptual  Algorithm 
for  solving  (Ba:)  for  each  value  of  OL  in  the  unit  interval.  We 
use  the  adjective  "conceptual"  because  computational  implementation 
is  not  considered  at  this  point  of  the  exposition.  The  Basic  Con¬ 
ceptual  Algorithm  can  be  modified  and  implemented  in  various  ways, 
as  will  be  indicated  in  sections  3  and  4,  thus  giving  rise  to  an  entire 
class  of  computational  algorithms. 

2. 1  Ass\jmptions 

We  ass\mie  that  an  optimal  solution  of  (I0()  is  available  for  some 
value  of  a  in  the  unit  interval,  say  a  =  0  (in  view  of  the  dis¬ 
cussion  of  subsection  1.1,  this  assumption  is  not  restrictive). 

Throughout  this  work  the  following  conditions  will  be  imposed 
upon  (IQ).  We  denote  the  feasible  region  (x:  g(x)  ^ 

Condition  1:  the  functions  ^^(x)  (i  =  1^2)  and  g^(x) 

(i  =  1, . . . ,m)  are  analytic  on  some  open  region 
Containing  X,  and  the  constraint  functions  are 
dbncave  on 

Condition  2:  X  is  non-empty  and  bounded. 
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Condition  g:  The  hessian  matrices^/  (i  -  1^2)  are 

negative  definite  for  all  x  e  X. 

Condition  4:  If  e  [0,1]  and  x*(a^)  is  an  optimal  solution 

of  then  the  matrix  whose  rows  are  the 

gradients  \^g^(x*(Q:^) ) ,  i  such  that 
g.(x*(a  ))  =  0,  is  of  maximal  rank. 

A  function  f(x^,...,x^)  of  n  real  variables  is  said  to  be 
analytic  in  a  region  R  if  in  some  neighborhood  of  every  point  of  R 
the  function  is  the  sum  of  a  convergent  power  series  with  real  coeffi¬ 
cients.  The  class  of  all  analytic  functions  includes,  for  example, 
all  polynomials,  and  seems  amply  wide  enough  to  include  nearly  any 
continuous  function  likely  to  be  encountered  in  applications. 

Conditions  1  and  2  imply,  by  A. 1  of  Appendix  A,  that  X  is 
convex  and  compact. 

Condition  3  implies,  by  A. 5;  that  f^  and  f^  are  strictly 
concave  on  X.  This,  in  turn,  implies  by  A.k  that  f(x;a)  =  af^(x)  + 
(l-a)f2(x)  is  strictly  concave  on  X  for  each  fixed  value  of 
a  e  [0,1].  In  the  presence  of  Conditions  1  and  2,  this  last  assertion 
remains  true  even  on  some  open  interval  containing  [0,l],  as 
Proposition  3  shows. 


6k 


negative  definite  on  X  for  each  fixed  value  of  a,  in  some 
open  interval  containing  [0,1]. 


Proof:  It  is  well-known  that  f(x;a)  is  negative  definite  at 

(x,Q!)  if  and  only  if  all  of  its  eigenvalues  i  (\7  f(xja))  ([i  =  l,...,n) 

—  [J,  •  X  , 

are  negative,  i.e.,  if  Majc  ^^^Vx  ^  Assume  for  the  moment 

that  the  last-mentioned  function  is  continuous  in  (x,Q:)  on  some  open 
region  containing  X  X  [0,1],  where  X  denotes  the  Cartesian  product. 
Since  a  positive  sum  of  negative  definite  matrices  is  again  negative 
definite,  from  Condition  3  it  follows  that  Max  i  (Sj  f(x;Q!))  <  0 

p  M-  X 

on  X  X  [0,1].  The  proposition  follows  from  this  fact,  the  assumed 
continuity,  and  the  compactness  of  X  X  [0,1]. 


To  see  that  Max  |  (^7  f (xja) )  is  continuous  on  some  open 

p  P  ''  X  — 

region  containing  X  X  [0,1],  observe  that  Condition  1  implies  that 
the  elements  of  f(x;a)  are  all  continuous  on  some  open  region 
containing  X  X  [0,1].  Since  the  eigenvalues  of  a  square  matrix  are 
continuous  functions  of  its  elements  (Ostrowski,  i960,  p.  192) , 

5  (y  f(xja))  (p  =  1, ...,n)  is  therefore  continuous  on  some  open 

[XX 

region  containing  X  X  [0,1]}  the  same  must  be  true  for 
Max  f(x>a)), 

^  [XX 


Remark:  As  indicated  in  Footnote  1  of  this  chapter.  Condition  3  may 
be  weakened  to  (in  the  following,  6  >  0  is  arbitrarily 

■  2  -f  2 

small):  (a)  ^Vx  negative  (semi-) 

definite  for  all  x  6  X,  if  [0,1]  is  replaced  by  [e,l]. 
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or  (b)  ^Vx  negative  (semi-)  definite 

for  all  X  e  X,  if  [0,1]  is  replaced  by  [0,l-e],  or 
(c)  ay^  y(x)  +  (l-a)y^  fgCx)  is  negative  definite  for 
all  X  e  X  at  each  a  e  (0,l),  if  [0,l]  is  replaced 
iiy  [e,l-e]. 

Condition  it-  is  equivalent  to  requiring  that  the  gradients 

Y7  g  (x*(a  )),  i  such  that  g.(x*(a  ))  =  0,  must  be  linearly 
Vx  l—  o  1—0 

independent^  hence  at  most  n  constraints  can  be  satisfied  with 
exact  equality  at  an  optimal  solution  of  In  the  remark 

following  the  Kuhn- Tucker  Theorem,  it  was  noted  that  this  condition 
implies  that  the  Kuhn- Tucker  Constraint  Qualification  holds.  Thus 
the  hypotheses  of  the  Kuhn- Tucker  Theorem  are  satisfied  by  (Ki^) 
for  each  fixed  €  [0,l]  when  Conditions  1,  5,  and  4  hold. 

2. 2  Statement  of  the  Basic  Conceptual  Algorithm 

For  convenience  we  view  0£  as  increasing  from  0  toward  1. 

Step  1:  Solve  (Po)  by  any  convenient  method,  so  that 

(x*(0),  u*(0),  S*)  satisfying  (KT-l)  through  (KT-4) 
at  a  =  0  is  at  hand.  Put  a°  =0,  S°  =  S*,  and 
(x,u)°  =  (x*(0) ,  u*(0)). 

Step  2:  Solve  equations  (=S°)a  by  any  convenient  method  as 
(X  increases  above  O!  for  the  unique  continuous 

cr  / 

solution^'  (x  (a),  u  (a))  satisfying  the  left 

— /  " "  - s  s 

^  Throughout  this  work  we  employ  the  symbol  (x  (a),  u  (a))  to 
denote  a  solution  of  equations  (=S)a. 
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so  long  as  this  solution  satisfies 


end-point  value  (x,u)° 

(KT-5)  and  (KT-4-);  that  is,  until  a  =  a',  where 

a'  =  Max  {a;  a°  <  a  <  1,  g^(x^  (a) )  >  0,  V  ±  ^  S°, 

u^  (a)  >  0,  V  i  e  s°  on  [a°,  a'])  . 

If  a'  =  1,  terminate.  Otherwise  put  (x,u)°  = 

qO  oO 

(x  (ex’),  u  (a'))  and  go  to  Step  5- 

Step  3:  Solve  equations  (=S)q:  by  any  convenient  method  as 
a  increases  above  a'  for  the  unique  continuous 

s  s 

solution  (x  (a),  u  (a))  satisfying  the  left  end-point 
value  for  different  sets  S  which  satisfy 

(8.1)  (i  €  M:  u^  (a')  >  0}  Cs  C{i  e  M:  (o:'))  =  O) 

S’  S’ 

until  for  some  S',  (x  (a) ,  u  (a))  satisfies  (KT-5) 
and  (KT-4)  on  [a',a'+e]  for  some  e  >  0.  Put 
=  a',  S°  =  S',  and  return  to  Step  2. 

The  next  subsection  is  devoted  to  the  development  of  the  theo¬ 
retical  results  necessary  for  Justifying  this  conceptual  algorithm. 
Complete  Justification  requires  proof  of  the  following 

Theorem  (Basic) : 

Assume  that  Conditions  1  through  4  hold.  Then  the  following 
assertions  regarding  the  Basic  Conceptual  Algorithm  hold: 
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(i)  Step  2  is  well-defined. 


^  s 

(ii)  At  each  execution  of  Step  2,  (x  {o.) ,  u  (cc) )  =  (x*(q:)  , 
u*(a))  on 

(iii)  Step  3  is  well-defined. 

(iv)  Step  3  will  be  executed  only  a  finite  number  of  times 
before  termination  obtains. 

2.3  Theoretical  Development 

Continuity  plays  a  crucial  role  in  parametric  programming. 

Theorem  1  (Continuity) : 

(i)  Assume  that  Conditions  1  through  3  hold.  Then  (Kx)  has 

a  unique  optimal  solution  x*(Q!)  >  and  x*(®)  continuous 

on  some  open  interval  containing  [0,1]. 

(ii)  Assume  that  Conditions  1  through  4  hold.  Then  (Ki)  has 

unique  dual  variables  u¥(0£)  (i  =  1, .  . .  ,m)  such  that 

(x*(a),  u*(a))  satisfies  the  Kuhn- Tucker  Conditions  (5) 
through  (7),  and  u*(a)  is  continuous,  on  some  open  interval 
containing  [0,1]. 

Proof:  First  we  prove  (i).  The  existence  of  an  optimal  solution 

of  (pa)  for  any  fixed  value  of  a  follows  from  the  fact  that 

f(x;a)  is  a  continuous  function  of  x  on  the  compact  set  X.  Ttie 
uniqueness  of  the  optimal  solution  follows  by  A. 2  from  the  fact  that 


f(x5a)  is  strictly  concave  in  x  over  the  convex  set  X  for  each 
fixed  value  of  a  in  some  open  interval  d  containing  [0,l].  Denote 
the  unique  optimal  solution  by  x*(ci:) . 

To  demonstrate  that  x*(a)  is  continuous  on  ,  suppose 

V  “  .  V 

the  contrary-  Then  there  exists  a  sequence  <  Q!  >->01  with  OL  , 
a.  e  t£  such  that  <  x*(c<:'^)  >/x*(5).  Hence  there  is  an  (open) 
neighborhood  N(x*(a))  of  x*(OJ^  such  that  x*('^'^)  ^  N(x*(Q!)) 
infinitely  often,  and  by  taking  a  subsequence,  if  necessary,  we  may 
assume  that  this  holds  for  all  v.  Since—/  X-N(x*(q:))  is  compact 
we  may  assume,  again  taking  a  subsequence  if  necessary,  that 
<  x*(c<^)  > -»  x'  s  (X-N(x*(a))  ).  Thus  by  the  continuity  of  f(x;a) 
with  respect  to  (x,a) ,  we  obtain 

(9)  <  f(x*(a'^)ja'^)  >-»f(x';a)  . 

Now  f(x*(a)ja)  S  Max  (f(x;a)  subject  to  g(x)  >  O)  is  the 

X 

supremum  of  a  family  of  functions  linear  in  CX,  and  therefore  is 
convex  in  a  on  Using  A.  5^  we  obtain 

(10)  <  f  (x*(a'')  ya^)  >  ->  f  (x*(a)  ;a)  . 

Assertions  (9)  and  (lO)  imply  that  f (x' ;a)  =  f (x*(a) ;a) ;  but  by 
construction  x'  ^  x*(a.) ,  so  that  the  unique  optimality  of  x*(a) 


— /  When  used  with  sets,  the  symbol  denotes 
Thus  X-N(x*(a))  =  (x  e  X:  x  N(x*(a))}. 


relative  complement. 
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is  violated.  Hence  x*(q:)  must  be  continuous  on  UC-  This  completes 
the  proof  of  (i). 

Now  we  prove  (ii).  The  existence  of  u*(Q!)  such  that  (x*(a) , 
u*(a))  satisfies  (5)  through  (?)  on  some  open  interval  containing 
[0,1]  would  follow  from  the  necessity  of  the  Kuhn- Tucker  Conditions 
if  the  hypotheses  of  the  Kuhn- Tucker  Theorem  were  satisfied  by  (itt) 
on  such  an  interval.  It  was  noted  in  subsection  2.1  that  these 
hypotheses  are  satisfied  for  each  value  of  a  g  [0,1].  To  show  that 
this  remains  true  on  some  open  interval  containing  [0,1],  in  view 
of  Condition  1,  Proposition  5,  and  the  remark  following  the  statement 
of  the  Kuhn- Tucker  Theorem,  it  is  enough  to  show  that  Condition  4  is 
still  satisfied  on  some  open  interval  containing  each  end-point. 
Consider  the  left  end-point  Oi  =  0.  Denote  by  D(0!)  the  matrix  whose 
rows  are  ,  i  such  that  g^(x*(o))  =  0.  By  Condition  4 

applied  at  a  =  0,  D(0)  has  rank  equal  to  the  number  of  its  rows, 

which  is  equivalent  to  the  existence  of  [D(0)D^(0)]  which  is 

equivalent  to  the  determinantal  inequality  |d(0)D^(0)|  ^  0.  Since 
|D(a)D^(a)|  is  a  continuous  function  of  CC  for  a  sufficiently 
near  0,  it  does  not  vanish  in  some  open  interval  containing  CC  =  0, 
and  so  D(a)  remains  of  maximal  rank  on  such  an  interval.  This 
implies  that  Condition  4  holds  on  some  open  interval  containing  0=0, 
for  by  the  continuity  of  x*('^)  hence  of 

g.(x*(a)),  one  easily  obtains  that  [i:  g^(3^(a))  =  O}^ 

[i;  g.(x*(0))  =  0}  for  OL  sufficiently  near  0.  A  similar  argu¬ 

ment  applies  to  0=1. 


f  ) 


TO 


on  some 


To  show  the  uniqueness  and  continuity  of  u*(a)  on  some 

open  interval  containing  [0,l],  fix  OL^  e  [0,l].  Since  x*(a)  is 

unique,  from  (7)  we  conclude  that  u|(a^)  must  vanish  for  each  i 

such  that  g^(x*(Q:^))  >0.  By  the  continuity  of  g^(x*(a)),  we 

have  that  g.(x*(a))  >0  on  some  open  interval  about  0!  when 
1  —  o 

g^(x*(Q!^))  >  0.  Hence  u^(a)  vanishes  on  some  open  interval  about 

for  each  i  such  that  g^(2^(Q:^))  >  0.  Denote  (i:  g^(x*(a^))  =  O] 
by  B.  It  remains  to  consider  u|(a),  i  e  B.  From  (5)  and  (7)  one 
obtains 

(11)  u*(aj  Vxgi(i*(“o^^  ' 

ieB 

Since  by  continuity  {i:  g^(x*(Q:))  =  0}C(i:  g^(x*(a^))  =  0)  =  B 

for  a  sufficiently  near  a^,  it  follows  from  (5)  and  (7)  that  (ll) 
must  hold  in  some  open  interval  about  with  the  same  summation  set. 

That  is, 


(12) 


V  f(x*(«);a)  +  2)  ut(a)  \lgAx*(a))  =  0 


ieB 


holds  on  some  open  interval  about  a  .  Write  u*(a)  for  the  row 

o  •— B 

vector  whose  components  are  u|'(a) ,  ieB.  Then  (12)  can  be  rewritten 


in  matrix  notation  as 


(12.1) 


u*(a)D(a)  =  -  y  f(x*(Q!);a)  . 


Repeating  a  previous  argument,  one  may  assert  that  [D(a)D^(a)  ]”^ 
exists  on  some  open  interval  containing  0!^.  Postmultiplying  (12.  l) 
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f 


by  D^(a)[D(a)D^(a)  ]“■  one  obtains  that  u*(a)  must  satisfy—^ 

— J3 

(12.2)  u*(a)  =  -  '\Jf(x*(o:);a)D^(Q:)['D(Q:)'^(a)  ]'^ 

on  some  open  interval  containing  d^.  The  right-hand  side  is  unique 
and  continuous  in  GL,  and  therefore  u*(q:)  is  also  unique  and  con- 
tinuous  on  some  open  interval  containing  a^. 

It  will  prove  convenient  to  introduce  some  special  notations. 
Define  A2  to  be  the  set  of  constraint  indices  corresponding  to  the 
constraints  which  are  active  at  a  in  the  sense  that  their  dual 
variables  are  strictly  positive: 

Aa  =  (i  €  M:  u*(a)  >0)  . 

Define  BQ!  to  be  the  set  of  constraint  indices  corresponding  to  the 
constraints  which  are  binding  at  x*(a) : 

Bd:  =  {i  e  M:  g^(x*(a))  =  O)  . 

The  sets  Aa  and  Ba  are  well-defined  on  some  open  interval  con¬ 
taining  [0,1]  because  of  the  existence  and  uniqueness  of  (x*(a), 
u*(a) )  on  some  such  an  interval.  We  can  now  state  two  important 
corollaries  of  Theorem  1. 

Corollary  1.1: 

Assume  that  Conditions  1  through  4  hold.  Then  for  each  e  [0, 

^  Equation  (12.2)  is  intended  only  for  theoretical  and  not  compu¬ 
tational  use. 
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there  exists  an  open  interval  containing  such  that,  on 

this  interval. 


Aa  C  Aa  CZ Boc  (~Bcc  . 
o  —  —  —  o 

Proof;  The  outermost  relations  follow  directly  from  the  definitions 
of  Aa  and  Boc  and  the  continuity  of  x*(a)  and  u*(a).  The  middle 
relation  follows  from  (7). 

Corollary  1.2: 

Assume  that  Conditions  1  through  4  hold.  Then  there  is  an  open 
interval  containing  [0,l]  such  that,  for  each  fixed  value  of 
a  in  this  open  interval,  a  subset  S  of  constraint  indices  is 
valid  at  a  if  and  only  if  AaCI  S  CBCU. 

Proof:  This  assertion  is  an  immediate  consequence  of  the  unique¬ 
ness  of  (x*(a),  u*(a))^  and  Proposition  2. 

The  significance  of  Corollaries  1.1  and  1.2  is  that  the  totality 

of  valid  sets  at  e  [0,1]  contains  the  totality  of  valid  sets 

for  a  sufficiently  near  a^.  Hence  the  optimal  solution  of  (Pa^), 

which  yields  Acc^  and  Ba^,  gives  a  strong  indication  of  the  identity 

of  a  valid  set  for  a  near  a  . 

o 

The  next  theorem  shows  that  equations  (=S)a  can  be  solved  on 
some  open  Interval  about  e  [0,1]  if  S  is  valid  at  a^. 

Theorem  2: 

Let  e  [0,1]  be  fixed,  let  S  be  valid  at  a^,  and  assume 
Conditions  1  through  4  hold. 


Then  there  exist  an  open  Interval  la^  containing  and  symmetric 
about  a^,  and  an  open  neighborhood  lj(x*(a^) ,  containing 

(x*(a^),  U*(a^)),  such  that  on  la^  there  is  a  unique  function 
(x^(a),  u®(q:))  in  N(x*(a^) ,  u*(«q))  which  satisfies  (=S)a. 

s  s 

Furthermore,  (x  (a),  u  (a))  is  analytic  on  la^. 

Proof:  The  theorem  would  follow  directly  from  a  version  of  the 

Implicit  Function  Theorem  (Bochner  and  Martin,  19^8,  p.  39)  applied 
to  the  equations  (=S)q:  if  the  following  hypotheses  of  that  theorem 
were  satisfied: 

(a)  (x*(a^),  ^*(0:^))  satisfies  (=S)a^. 

(b)  The  left-hand  side  of  each  equation  of  (=S)a  is  analytic 

in  (x,u,a)  in  an  open  neighborhood  of  (x*(®q)  ^  '*^0^  ‘ 

S((=S)a  ) 

(c)  The  Jacobian  — o - ^  is  non-zero  at  (x*(a  ) ,  u*(a  )). 

^  o^x,u;  —0—0 

By  the  validity  of  S  at  a° ,  part  (i)  of  Proposition  1  and  Corollary 
1.2,  (a)  holds.  It  follows  from  Condition  1  that  (b)  holds.  To 
simplify  the  task  of  showing  that  (c)  holds,  we  regroup  the  order  of 
partial  differentiation  ,  which  is  equivalent  to  regrouping  the  columns 
of  the  Jacobian  matrix,  so  that  we  actually  consider  the  Jacobian 

S((=s)a  ) 

— - - - ^  .  Writing  H  for  the  n  by  n  hessian  matrix 

8(x;  u^,  ieS;  u^,  ijeS) 

-  m 

and  D  for  the  matrix  whose  rows  are  y^gi(x*(ao)),  i  e  S,  one 

u 
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if 


readily  derives  that  this  Jacobian,  evaluated  at  (x*(a^) ,  u*(a^) ) , 
is  the  determinant  of  the  matrix  (we  use  dotted  line  to  denote 
partition) 


H 

D 

0 


I 

I  0 

i  0 
1  I 


where  0  and  I  are  zero  and  identity  matrices  of  the  appropriate 
orders.  The  determinant  is  non-zero  if  and  only  if 


is  invertible,  which  is  true  if  and  only  if  the  matrix  equation 


(15) 


has  y  =  0,  z  =  0  as  its  only  solution,  where  y  is  an  n-vector 
and  z  is  a  vector  with  a  number  of  components  equal  to  the  number 
of  constraint  indices  in  S.  The  proof  of  the  theorem  will  be  com¬ 
plete  when  we  show  that  (15)  has  only  the  null  solution. 

Performing  the  indicated  block  multiplications  for  (l3)  > 
obtains 


(15.1)  =  0 

(13.2)  Dy  =  0  • 
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Now  H  is  negative  definite^  for  it  is  a  positive  linear  combination 
of  negative  semidefinite  hessians,  at  least  one  of  which  is  known  to 
be  negative  definite.  Hence  H  is  invertible,  and  (l3.l)  yields 

(13.3)  y  =  -h’^d’^^z  . 

Premultiplying  (13.3)  by  D  and  using  (13. 2),  one  obtains 

(13.4)  Dy  =  -DH  ^d'^_z  =  _0  . 

By  Corollary  1.2,  By  Condition  4,  therefore,  D 

is  of  maximal  rank,  and  that  rank  equals  the  number  of  rows  of  D. 
Hence  [DH“^D^]  is  invertible,  and  (13.^)  yields  £  =  0.  By  (l3.3)^ 
y  =  0  also.  Thus  (13)  has  only  the  null  solution. 

Corollary  2.1: 

Let  a  e  [0,1]  be  fixed,  let  S  be  valid  at  a  ,  and  assume 
that  Conditions  1  through  4  hold. 

Then  there  exists  an  open  interval  containing  and  contained 

in  la  such  that,  for  each  fixed  value  of  a  in  this  interval, 
o 

the  following  three  assertions  are  equivalent: 

(i)  S  is  valid  at  a. 

(ii)  (x®(a),  u®(a))  =  (x*(a) ,  u*(a))  . 

(ill)  g^(x®(a))  >0,  Y  ±  ^  S 
u®(a)  >0,  Vies. 
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Proof;  (l)— >  (ii).  By  continuity,  (x*(a),  u*(a))  e  N(x*(a^), 
u*(a^) )  for  all  a  sufficiently  near  a^)  by  the  validity  of  S 
at  Qt,  part  (i)  of  Proposition  1,  and  Corollary  1.2,  one  concludes 
that  (x*(q:)  ,  U*(a))  satisfies  (=S)a;  since  the  solution  of 
(=S)q:  is  unique  in  N(x*(a^),  u*(“q)  )  ^  e  assertion 

(il)  follows. 

(ii)=5>  (iii).  Because  (x*(q:)  ,  u*(q:))  satisfies  (5)  through 
(7),  (  ill)  must  hold. 

(iil)=>  (i).  Assertion  (iii)  and  the  fact  that  (x^(a), 

s 

u  (a))  satisfies  (=s)q:  imply  by  the  definition  of  validity  that 
S  is  valid  at  (X. 

One  more  result  must  be  established  before  a  complete  proof  of 
the  Basic  Theorem  can  be  given. 

Define  a  point  of  change  of  Ba  as  a  point  a'  with  the  pro¬ 
perty  that  there  is  no  open  interval  containing  a'  such  that 
Ba  =  Ba'  everywhere  on  that  interval.  A  similar  definition  holds 
for  a  point  of  change  of  Pa,  In  the  sequel,  the  phrase  "point  of 
change"  is  used  to  refer  to  either  a  point  of  change  of  Pa  or  of 
Bo:,  or  possibly  of  both. 

Theorem  3  (Finiteness) : 

Assume  that  Conditions  1  through  4  hold.  Then  Pa  and  Ba 
each  have  a  finite  number  of  points  of  change  on  [0,1]. 

Proof:  Suppose  that  Ba  has  a  finite  number  of  points  of 
change  on  [0,1].  Then  there  is  a  cluster  point  a  e  [0,1]  of 
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these  points  of  change.  Let  <  >,  CL  e  [0,l]j  he  a  sequence  of 

distinct  points  of  change  of  KX  which  converges  to  a.  Applying 
Corollary  1.1  at  ,  we  see  that  there  exists  an  open  interval 
containing  such  that  Aa^C  on  this  interval. 

By  the  definition  of  a  point  of  change  of  Ba,  for  each  a  there 
exists  a  nuniher  ^  contained  in  this  interval  and  in  (oi  - 
q;’^  +  1)  such  that  Aa^^Ap’^OBP^  (note  that  Bp”^  is  a 

proper  subset  of  .  Clearly  P  >  -»  Oi.  From  Corollary  1. 1 

g^pp2.i.6d.  SL'fc  ctf  "WB  sGG  'tliEL'fc  W0  iiELVc  d-Giiions'tP3<'ted.  "ttiB  0xis't0nce  of 
two  sequences  '  <  >  -  a,  <  p^  >  ->  a,  such  that  Aa  CAa^Q 

AP^  c:  BP^  c:  Bct'' C3.  ^  for  all  V  sufficiently  large.  Since  there 
is  but  a  finite  nximber  (2  )  of  possible  sets  which  BP  or  BCX 
could  possibly  be,  we  may  assimie,  taking  a  subsequence  if  necessary, 
that  there  exist  sets  B'  and  B"  such  that  BP  =  B''  d  BO!  =  B' 
for  all  V. 

■ptt  • 

Consider  the  function  x  (a)  defined  as  in  Theorem  2  applied 
at  a.  Since  B"  is  valid  at  a  and  at  all  a  and  p  ,  V 

■qH 

sufficiently  large,  x  (a)  =  x*(a)  at  these  points.  Take  i^  €  B'-B". 

Then  g.  (x®"(a^))=  0  and  g.  (x®  (p^))  >  0,  all  V  sufficiently 
O  B"  —  ° 

large,  and  g,  (x  (ct) )  =  0.  In  other  words,  we  have  shown  that 

O  gii 

a  is  a  non-isolated  zero  of  g.  (x  (a)),  and  that  this  function  is 

o  _ 

not  identically  zero  on  any  open  interval  about  a.  But  this  leads 

to  a  contradiction  of  the  well-known  fact  (Apostol,  1957^  P-  5l8)  that 

the  zeros  of  an  analytic  function  which  is  not  identically  zero  are 

isolated,  for  by  Theorem  2  and  Condition  1  we  have  that  (x  (a)) 

o 
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is  analytic  on  some  open  interval  about  Oi.  Hence  the  supposition 
that  Ba  has  an  infinite  number  of  points  of  change  on  [0,1]  is 
false. 

A  similar  argument  shows  that  Aa  cannot  have  an  infinite  number 
of  points  of  change  on  [0,1]. 

Applying  the  result  of  Theorem  3  to  a  given  (Ba) ,  define 

0  <  <  •  •  •  <  <  1  to  be  the  collection  of  all  points  of 

change  of  Aa  or  Ba  or  both.  As  a  matter  of  convention  we  take 

a'  =  0  and  a'  ^  =  1.  From  Corollaries  1.1  and  1.2  we  conclude  that 
o  N+1 

any  set  which  is  valid  at  a,  al  <  a  <  is  also  valid  on  the 

entire  closed  interval  [a'.,  a'  ^  ].  In  addition,  it  may  also  be 
valid  on  other  intervals,  of  course.  Among  the  sets  which  are  valid 
at  a',  there  are  all  those  which  are  valid  on  [a'.  . ,  a'.]  or  on 

“hi’- 

We  are  now  in  a  position  to  prove  the  Basic  Theorem. 


Proof  (Basic  Theorem) :  First  we  prove  parts  (i)  and  (ii) .  At 

the  beginning  of  each  Step  2,  (x,u)°  and  S°  satisfy  (KT-l) 

through  (KT-4)  at  a°,  so  that  S°  is  valid  at  a°  and  = 

(x*(a°)  ,  u+(a°)).  Let  J,  1  ^  J  ^  N-i-l  be  the  largest  Integer  such 

that  S°  is  valid  on  [a°,  aj.]  (aj  =  =  q:°  =  0  is  permissible 

the  first  time  Step  2  is  executed).  Applying  Theorem  2  at  each  point 

of  [a°,  ctj]?  it  follows  that  (=S°)a  has  a  unique  analytic  solution 

o 

(x  (a) ,  u'  (a) )  satisfying  the  left  end-point  value  some 

interval  containing  [a°,  a'].  This  solution  satisfies  (KT-3)  and 

J 
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(KT-4)  and  equals  (x*(a) ,  u*(a))  on  [a°;  aj]  by  Corollary  2.1. 

If  a'  =  1,  the  solution  of  (PO!)  on  [O^l]  is  complete.  If 

iJ 

gO  gO 

CCj  <  1,  however,  (x  (a),  u  (a))  does  not  satisfy  (KT-5)  and 
(KT-4)  for  any  Ct  e  (cc',  OC'  for  otherwise  by  Corollary  2.1 

applied  at  Q!',  S°  would  be  valid  on  [cc' ,  CC'  ],  which  would 

cl  cT  cl  *  X 

violate  the  definition  of  J.  Clearly  the  scalar  a'  defined  in 
Step  2  is  precisely  a',  and  (i)  and  (ii)  hold. 

c) 

Next  we  prove  (iii).  Any  set  S  which  satisfies  (8.l)  is  valid 

g  O  g  o 

at  a',  by  Corollary  1.2  and  the  fact  that  (x  {pL') ,  u  (cc'))  = 
(x*(a'),  u*(a’))-  Applying  Theorem  2  at  a',  we  see  that  if  S 
satisfies  (8.1)  then  (=S)a  has  a  solution  as  stated  on  [a',  a'+e^] 
for  some  >  0.  By  Corollary  1.1  we  know  that  at  least  one  such 
S,  say  S’,  is  valid  on  [a',  a'+€g]  for  some  0  <  Gg  <  g^;  by 

q  J  q  J 

Corollary  2.1  applied  at  a',  (x^  (a),  u  (a))  satisfies  (KT-3) 

and  (KT-i)-)  on  [O!',  a'+e]  for  some  0  <  e  <  Gg.  Since  there  is  but 
a  finite  number  of  sets  satisfying  (8.l),  S'  will  be  found  after  a 
finite  number  of  trials. 

Finally,  we  prove  (iv) .  It  was  established  in  the  proof  of  (i) 
that  Step  5  is  entered  each  time  a  point  of  change  Q!'  is  encountered 
at  Step  2  such  that  the  current  set  S°  being  used  at  Step  2  is  not 
valid  immediately  above  a' .  It  was  established  in  the  proof  of 
(iii)  that  Step  5  finds  a  set  which  valid  immediately  above  a' 
in  a  finite  number  of  trails,  and  control  is  returned  to  Step  2  along 
with  the  new  valid  set.  By  convention  we  have  taken  OL  increasing, 
and  by  Theorem  5  there  is  but  a  finite  number  of  points  of  change  on 
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[0;l];  it  follows  that  Step  5  will  only  have  to  be  executed  a  finite 
number  of  times  before  termination  obtains. 


Remark:  At  Step  2,  a'  need  not  be  the  next  point  of  change  above 

oP j  for  S°  may  remain  valid  on  an  interval  spanning  several 
points  of  change.  The  algorithm  could  be.  modified  to  require 
S°  =  B3  at  Step  2,  so  that  «'  would  assume^  in  turn  cne 
values  of  each  point  of  change  of  Ba;  or  one  could  require 
S°  =  Aa  at  Step  2,  so  that  a'  would  assume^  in  turn,  the 
values  of  each  point  of  change  of  Aa.  The  minimum  require¬ 
ment  (the  one  adopted  here)  is  Aa  Cl  S  at  Step  2,  and 

seems  more  symmetrical  and  less  arbitrary  than  either  of  the 
extreme  requirements  just  mentioned. 

From  the  proof  of  the  Basic  Theorem,  it  is  clear  that  the  Basic 
Conceptual  Algorithm  can  be  paraphrased  as  follows. 

Step  1:  By  any  convenient  method,  find  the  optimal  solution 
(x*(0),  u*(0))  of  (Po).  Set  a°  =  0,  S°  equal 
to  any  set  valid  at  a  =  0,  and  u*(o)). 

Step  2:  Solve  (=S°)a  as  a  increases  above  a°  for  its 

unique  continuous  solution  satisfying  the  left  end-point 
condition  (x®°(a°),  u®°(a°))  =  (x,u)°,  namely 
(x*(a),  U*(a)),  until  either  a  =  1  or  a  point  of 
change  a'  of  Aa  or  Ba  is  encountered  to  the  right 
of  which  S°  is  no  longer  valid.  In  the  first  case, 
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terminate;  In  the  second  case,  set  “  (x*(0!'), 

u*(0!'))  and  go  to  Step  5- 

Step  5:  Among  all  sets  valid  at  a',  find  one  which  is  valid  to 
the  right  of  a’.  Call  it  S'.  Set  a°  =  a' ,  S°  =  S', 
and  return  to  Step  2. 

See  Appendix  B  for  graphical  illustrations  of  this  algorithm. 

Now  that  the  Basic  Conceptual  Algorithm  has  been  theoretically 
justified,  we  take  up  computational  considerations. 

5.  A  Basic  Computational  Algorithm 

In  order  to  implement  the  Basic  Conceptual  Algorithm,  it  is  necessary 
to  have  a  method  of  actually  solving  (=S)a  as  a  changes  parametric¬ 
ally.  Only  in  certain  simple  cases  is  it  possible  or  economical  to 
solve  these  equations  analytically,  and  so  usually  numerical  methods 
must  be  used.  We  recommend  Newton's  method,  or  a  variation  thereof, 
as  an  efficient  means  of  solving  (=S)o:  on  a  digital  computer  as  0! 
changes  by  small  discrete  jumps. 

After  proving  the  applicability  of  Newton's  method,  we  state  and 
prove  a  Basic  Computational  Algorithm.  Some  necessary  computational 
refinements  are  then  briefly  indicated,  with  further  details  being 
added  in  Appendix  C. 

5. 1  Newton's  Method 

Newton's  method  is  briefly  reviewed  in  Appendix  C.  Under  Conditions 
1  through  h,  it  is  easily  seen  from  Theorem  C.l  of  Appendix  C  and  the 
proof  of  Theorem  2  that  for  each  e  [0,1],  Newton's  method  applied 
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to  (=s)a  is  well-defined  and  quadratically  convergent  to  (x*(a^)^ 
u*(a  ))  if  S  is  valid  at  a  and  if  the  starting  point  (x^n)  is 
in  a  sufficiently  small  neighborhood  of  (x*(aQ) ,  u*(a^))-  Since 
(x*(a) ,  u*(a))  is  continuous,  by  taking  small  enough  (x*(a^-A3!)  , 

U*(a  -Aa:))  is  such  a  starting  point.  In  other  words,  Newton’s  method 
is  applicable  point  by  point.  Does  there  exist  Zkr  >  0  such  that  a 
computational  algorithm  can  be  designed  using  Newton's  method  to  solve 
(=S)a  with  A?,  as  a  fixed  step  size  throughout?  The  answer  is 
affirmative,  and  requires  a  proof  that  the  size  of  the  neighborhoods 
mentioned  above  may  be  taken  to  be  bounded  away  from  zero. 

Theorem  4.1; 

Let  Conditions  1  through  4  hold,  let  s  [0,1]  not  a  point  of 
change  be  fixed,  and  let  S  be  valid  at  a^. 

Then  there  exists  a  scalar  r’  >0,  which  does  not  depend  on 

a  or  on  S,  such  that  Newton's  method  applied  to  equations  (=S)a^ 
o 

is  well-defined  and  quadratically  convergent  to  (x*(“o) ^  ^ 

if  the  starting  point  (x,u)°  is  in  the  (n+m  dimensional)  neighborhood 

u*(a^)). 

Proof: 

1.  We  shall  use  the  notation  and  observations  immediately  following 
the  proof  of  Theorem  J.  To  prove  this  theorem  it  is  sufficient  to 
show  that  for  each  j  (J  =  0,...,N)  there  exists  a  scalar  r(j)  >  0 
such  that  the  following  assertions  hold  on  N^q)  ,  u*(a^)) 

for  any  fixed  e  [a^,  and  any  S  valid  on  [«', 
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(a)  The  left-hand  side  of  each  equation  of  (=S)a^  is 
twice  continuously  differentiable  with  respect  to 
(x,u) . 

S((=S)q:^) 

(b)  The  Jacobian  - r —  f  0. 

d(x,u) 

(c)  A(x,u;  <  L  <  1,  where  A(x:,u;aj^j,S)  is  a  certain 

■Qpp03;*  estimate  of  the  norm  of  the  Jacobian  matrix  of  the 
iteration  function  derived  by  applying  Newton's  method 
to  (=S)a^  (see  section  1  of  Appendix  C)  . 

To  see  why  this  plan  is  sufficient,  let  r'  =  Min{r(o) , . . . ,r (N) } , 

let  0!  €  [0,l]  not  a  point  of  change  be  fixed,  and  let  S  be  valid 
o 

at  a  .  Then  for  some  j  between  0  and  N  we  have  that  S  is 
o 

valid  on  [a^,  and  €  [a’,  aj+q]-  Applying  Theorem  C.2  of 

Appendix  C,  we  see  that  Newton's  method  applied  to  (=S)a^  is  well- 
defined  and  quadratically  convergent  to  (x*(a^)  ,  ^^*(0^0)) 
starting  point  ®  '  u*(q:^))- 

2.  Let  J  be  fixed,  0  <  j  <  N,  let  e  [a^, 
let  S  be  any  set  which  is  valid  on  [a^. 

By  Condition  1,  the  left-hand  side  of  each  equation  of  (=S)a^ 
is  twice  continuously  differentiable  with  respect  to  (££;u)  some 

open  neighborhood  of  (2£*(®q)  j  ^  ’ 

a((=S)a  ) 

The  Jacobian  -^7 - ^  0  at  (x*(a  )  ,  u*(a  ))  by  the  proof 

of  Theorem  2.  As  a  consequence  of  Condition  1,  this  Jacobian  is  con¬ 
tinuous  with  respect  to  (x,u)  on  some  open  neighborhood  of  (x*(a^) , 

u*(a  )).  One  concludes  that  the  Jacobian  does  not  vanish  in  some  open 
—  '  o 

neighborhood  of  (x*(0!q)  j  ^  • 
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It  can  be  shown  in  a  straightforward  manner  (see  Henrici,  1964, 
p.  106)  that  A(x,u;  CC^jS)  vanishes  at  (x*(a^) ,  u*(a^)).  By  Condition  1 
this  function  is  continuous  with  respect  to  some  open  neigh¬ 
borhood  of  (x*(a^),  u*(a^)).  One  concludes  that  A(x,uj  <  L, 

where  0  <  L  <  1,  on  some  open  neighborhood  about  (x*(a^)  ,  u*(a^)). 

Summarizing  this  part  of  the  proof,  we  assert  that  (a),  (b) , 
and  (c)  hold  on  some  open  neighborhood  of  (x*(a^) ,  when 

a  e  [al,  a'.  ]  and  S  is  any  set  which  is  valid  on  [«', 

033+1  3 

3.  Since  (x*(a) ,  u*(a))  is  continuous  on  the  compact  set 
[a'.,  A'  ],  the  image  set 

r  =  ((x,u):  (x,u)  =  (x*(a)  ,u*(a) )  for  some  a,a^  5  «  < 

is  compact.  It  follows  from  the  compactness  of  f  and  the  result 

of  part  2  of  this  proof  that  there  exists  a  scalar  r(3)  >  0  such 

that  (a),  (b),  and  (c)  hold  on  j)  (x*(“o^ " 

a  €  [a'.,  a',  and  S  is  any  set  which  is  valid  on  [a',  Q!!,-,]- 

o  3  3+1  “  “ 

When  Conditions  1  through  4  hold,  we  define  to  be  the  minimum 

distance  between  any  two  points  of  change  on  [0,1],  and  to  be 

the  length  of  the  shortest  of  all  the  intervals  la^  defined  in  Theorem  2 

applied  at  every  point  of  change  on  [0,1]  with  each  set  which  is 

valid  at  each  point  of  change.  Define  Z  =  ^  i ^ .  Note  that 

(x®(a),  u®(a))  is  uniquely  defined,  by  Theorem  2  applied  at  , 

on  la'.  =  [a'.  -  1,  a'.  +  ^],  for  any  1  5  3  5  N  and  any  S  valid 
3  3  3 

at  a'.  . 

3 
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Theorem  4.2 


Let  Conditions  1  through  4  hold,  let  OL'  e  [0,l]  be  a  partic¬ 
ular  point  of  change,  and  let  S  be  valid  at  a'. 

Then  there  exist  scalars  r"  >  0  and  0  <  i"  <  i,  which  do 
not  depend  on  a'  or  on  S,  such  that  Newton's  method  applied 
to  (=S)q:^  is  well-defined  and  quadratically  convergent  to 
(x^(a^),  u^(a^))  if  OL^  e  [a'-i",  a'+i"]  and  if  the  starting  point 

Proof : 

1.  Since  there  is  a  finite  number  of  points  of  change  on  [0,1] 
and  a  finite  number  of  valid  sets  at  each,  it  is  sufficient  to  show 
that  the  theorem  holds  with  r  and  i  possibly  depending  on  a' 
and  S.  This  will  be  done  by  applying  Theorem  C.2  of  Appendix  C. 

2.  Let  a'  e  [0,l]  be  a  particular  point  of  change,  and  let 

S  be  valid  at  a'.  It  remains  to  demonstrate  the  existence  of  scalars 
r  >  0  and  0  <  £  <  -^  such  that  the  following  three  assertions  hold 
on  N  (x®(a  ),  u®(a  ))  when  a  e  [a'-£,  a'+i]: 

(a)  The  left-hand  side  of  each  equation  of  (=S)a^  is 

twice  differentiable  with  respect  to  ■ 

a((=s)a^) 

(b)  The  Jacobian  -  ^  0. 

3(x,u) 

(c)  A(x,u;  o:^,S)  <  L  <  1. 

3.  In  view  of  the  fact  that  (x^(a'),  u^(a'))  =  (x*(q;'),  u*(o!')), 
we  may  argue  as  in  part  2  of  the  proof  of  Theorem  4.1  that  (a) ,  (b) , 
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and  (c)  hold  for  Ot^  =  a'  on  sdme  open  neighborhood  of  (x^(a'); 

u^(q:')). 

s  s 

4.  Since  (x  (a),  u  (a))  is  continuous  on  the  closed  interval 

la',  and  therefore  uniformly  continuous,  one  may  assert  the  existence 

of  scalars  r  >  0  and  0  <  &  <1  such  that  (a),  (b),  (c)  hold  on 

N  (x^(a  ),  u^(a  ))  when  a.  e  a'+£]. 

r  —  o  —  o  o 

By  specializing  Theorem  1+.2  to  =  a',  and  recalling  that 
(x^(a'),  u^(a'))  =  (x*(o!'),  u*(o(;'))  when  S  is  valid  at  OC' ,  it 
is  evident  that  Theorem  l+.l  is  ^till  true  if  is  permitted  to  be 

a  point  of  change.  Since  (x*(c^) ,  u*(a))  is  continuous  on  [0,1], 
it  is  uniformly  continuous  on  [0,1],  and  one  immediately  obtains 
the  following  corollary  of  Theorem  Jj-.l. 

Corollary  4.1; 

Let  Conditions  1  through  4  hold,  let  e  [0,1],  and  let  S 

be  valid  at  (X  . 

o 

Then  there  exists  a  scalar  5'  >  0,  which  does  not  depend  on 
or  on  S,  such  that  NeWton's  method  applied  to  (=S)a^  is 
well-defined  and  quadratically  convergent  to  (x*(a^),  u*(a^)) 
if  the  starting  point  is  (£C*(a^-6),  u*(a^-6))  and  |6]  <  6', 

0  <  a  -5  <  1. 

—  o  — 

A  similar  argument  shows  thht  Theorem  k.2  yields  the  following 


corollary. 


Corollary  4.2: 

Let  Conditions  1  through  4  hold^  let  a'  e  [O^l]  he  a  particular 
point  of  change,  and  let  S  he  valid  at  O!'  . 

Then  there  exist  scalars  8"  ^  0  and  0  ^  which  do  not 

depend  on  a'  or  on  S,  such  that  Newton's  method  applied  to 
(=S)a  is  well-defined  and  quadratically  convergent  to 
(x^(aj,  u^(aj}  if  €  [a'-£'',  a'+r']  and  if  (x*(a^-8), 
u*(a  -&))  is  the  starting  point  and  |8|  <  8",  0  <  5  1- 

3. 2  The  Basic  Computational  Algorithm 

Using  the  results  of  the  previous  subsection,  we  can  design  a 
computational  counterpart  of  the  Basic  Conceptual  Algorithm  by  using 
Newton's  method  to  solve  (=S)a  as  a  increases  by  steps  of  size 
A"/..  A  useful  idealization  is  obtained  by  assuming  that  there  is  no 
computational  error.  In  view  of  the  quadratic  nature  of  the  convergence 
of  Newton's  method,  it  is  no  less  plausible  to  assume  that  Newton's 
method  converges  to  an  exact  solution  of  (=S)a  when  it  theoretically 
should  converge'.— An  annotated  flow  chart  of  the  Basic  Computational 
Algorithm  is  given  in  Figure  1. 


Theorem  ^  ; 

Assume  that  Conditions  1  through  4  hold,  that  there  is  no  compu¬ 
tational  error,  and  that  Newton’s  method  converges  to  an  exact  solution 
of  (=S)a  when  it  theoretically  should  converge. 

-/  This  assumption  is  strictly  true  only  when  f^  and  fg  are  quadratic 
polynomials  and  all  constraints  are  linear,  in  which  case  (=S)a  is  a 
set  of  linear  equations  in  Newton's  method  therefore  leads 

to  an  exact  solution  in  a  single  iteration. 


Then  there  exist  €  >  0  and  Zto:  >  0  such  that  the  Basic  Compu¬ 
tational  Algorithm  is  well-defined  and  will  terminate  with  JAa:  =  1 
in  a  finite  number  of  computational  steps. 

Proof :  Put 

6  =  i  Min  {u*(aO  }  • 

^  ^1<3<N 

i  e  Act'. 

3 

By  construction,  >  0.  By  the  uniform  continuity  of  u¥(q:) 

(i  =  1, . . . ,m)  on  [0,1],  there  exists  a  scalar  6^  >  0  such  that 

|a-a'.  I  <  6,  implies  |u-^(Q:)-u^(o;p  |  <  e,  (i  =  1, .  .  .  ,m)  for  any  J 
(j  =  1, Put 

ep  =  4  Min  g  (x*(ap)  . 

1  ^  »■ 

By  construction,  >  0.  By  the  uniform  continuity  of  g^(x*(Q;)) 

(i  =  1, . . . ,m)  on  [0,1],  there  exists  a  scalar  &g  >  0  such  that 

|a-a'.  I  <  5o  implies  |g.(x*(a))  -  g  (x*(ap )  |  <  Gp  1,  ...,m) 

'j  —  B  1—  xj 

for  any  j  ( j  =  1, . . . ,n) . 

Put  €*  =  Min(€^,€2]  and  AZ*' =  l/K,  where  K  is  the  smallest 
integer  satisfying  K>  2/Min{6^, 82,6’ , B" , i" } •  In  view  of  the  Basic 
Theorem,  to  prove  this  theorem  it  is  sufficient  to  show  that  for  these 
choices  of  e  and  iAZ  Newton's  method  is  well-defined  and  sure  to 
be  convergent  as  stated  in  Steps  2  and  5,  and  that  the  trials  at  Step  5 


must  lead  to  a  success. 


At  each  application  of  Newton's  method  during  Step  2., 

(x  >  u  )  =  (x*(  (j-1)A>:)  ,  u*(  (J-1)A2;)  )  and  S°  is  valid  at 
a.  =  (j-l)Aa!.  If  S°  is  valid  at  jAi!,  then  since  Aofjf  <  8'  we 
have  by  Corollary  4.1  that  Newton's  method  is  well-defined  and  con¬ 
vergent  to  (x*(jA>:),  u*(jAa:)).  if  S°  is  not  valid  at  JAr, 
then  since  Ai*  <&''<&  there  must  be  exactly  one  point  of  change 
a'  <  1  on  [  (J-1)A>:,  JAr];  but  S°  is  valid  at  a' ,  Ai*  <  , 

and  Ai*  <  &' ,  so  by  Corollary  4.2  Newton's  method  is  well-defined 
and  convergent  to  (x  ( JAi) ,  u  (JAi)),  and  Step  5  is  entered.  By 
the  choice  of  €*,  A  =  Act'  and  B  =  B3!'.  Corollary  4.2  again  applies^ 

and  ensures  that  Newton's  method  is  well-defined  and  convergent  to 
S  S 

(x  (jAi),  u  (jAi))  when  ACS  QB.  The  trials  are  sure  to  lead  to 
a  success  because  some  set  which  is  valid  at  a'  must  also  be  valid 
at  JAi,  since  a'  is  the  only  point  of  change  on  [  (j-l)Ai:,  JAi]. 

A  word  is  in  order  about  the  consequences  of  taking  e  and 
different  from  e*  and  2a*.  This  is  of  considerable  practical 
importance,  since  e*  and  2a*  cannot  be  calculated  beforehand. 

It  is  possible  to  give  a  detailed  discussion  of  the  difficulties 
caused  in  the  Basic  Computational  Algorithm  by  "poor"  choices  of  e 
and  2a,  but  we  shall  limit  the  present  discussion  to  a  few  general 
remarks . 

It  is  clear  from  the  proof  of  the  theorem  that  when  e  =  e* ,  any 
Ad;  <  2a*  will  do;  in  fact,  to  every  e,  0  <  e  <  e*,  there  exists 
AD:*(e),  0  <  2cc*{e)  <  AD*,  such  that  the  Basic  Computational  Algorithm 
is  well-defined  and  computationally  finite  when  e  and  Ad  are  used 
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Solve  (Po).  Put  (x°,u°,S°)  equal  to  a 
solution  of  (KT-l)  through  (KT-4)  at  0=0 


Put  (x"’^,u""^)=(x",u^) 
Put  J  =  J+1 


Choose  step  size  Ao  >  0 
Choose  £  >  0 
Put  J  =  0 


Is  JA  a  >  1? 


Iterate  from  (x'^”^,u'^”^)  to 
(x  ,u  ) ,  the  solution”'"  of 


u  Terminate  (=S°)jAa,  by  Newton's  method 

(S",s  still  VA,L\0  at  JZioc) 


write  .. 

(x*(  J  A  o)  ,u*(JA  o) ) 


Put  S  =  S 


-  y _ 

Is  g^Cx"^)  >0,  V  i  S°,  and 
u^  >  0,  Vie  S°? 


Put  A  =  {i:  u^"^  >  e} 
and  B  =  {i:  g^(x'^"^)  <  e} 


Choose  S  such  that  A  C  S  CZ  B 
and  S  not  tried  before  at  the 
current  value  of  J 


Iterate  from  (x'^‘^,u'^"^)  to  (x'^,u'^) ,  the 
solution”'”  of  (=S  ')jAo,  by  Newton's  method 


(  S*  ts  A/t>r 
VfILIO 


Is  g^Cx")  >  O,  V  i  ^  S  ,  and 


(TH/S  7->^/^L^v45V  ^  I  e  S  ? 

sutaesspuL;  S  /s  ^ - — — 

VALID  PIT  cr^ix  ) 

Figure  1 


/(this  TRIfit  WAS  A/ oT 
SOecfssi=i;L  :  A  is  a/ot 
V/9LIO  AT 


Flow  Chart  of  the  Basic  Computational  Algorithm 

4- 

The  notation  used  here  is  contradictory  of  that  used  elsewhere 
in  this  work:  (x'^ju'^)  actually  means  (x®  ( JAu:)  ,u®°( JAi:) )  at 
Step  2,  for  example. 


and  Ax  <  AX*(6).  Thus  €  and  Ax  need  not  be  exactly  €*  and 
Aa.*  in  order  for  the  algorithm  to  be  applicable.  In  general,  however, 
the  following  qualitative  assertions  hold:  (a)  when  €  is  too  small, 
there  may  be  too  few  candidate  sets  at  Step  3,  i.e. ,  there  may  be  no 
set  satisfying  ACS  CB  which  is  valid  at  JA,  so  that  Step  3 
cannot  be  successfully  completed;  (b)  when  e  is  too  large,  there 
may  be  too  many  candidate  sets  at  Step  3>  resulting  in  an  excessive 
number  of  trials  before  Step  3  is  successfully  completed  and  possibly 
in  the  break-down  of  Newton's  method  (lack  of  convergence  or  lack  of 
existence  of  the  required  inverse  matrix)  for  the  trial  sets  which 
are  not  valid  at  JAx  and  do  not  satisfy  the  hypotheses  of  Corollary  4.2 
applied  at  the  point  of  change  just  before  JA«;  (c)  when  An  is  too 
small,  the  algorithm  is  applicable  but  requires  more  executions  of 
Step  2  increments  in  a,  thereby  reducing  the  efficiency  of  the  algorithm 
for  a  user  who  would  be  satisfied  with  knowing  (x*(a) ,  u*(a))  for  a 
coarser  grid  of  values;  and  (d)  when  AX  is  too  large,  Newton's  method 
is  apt  to  be  ill-defined,  or  divergent,  or  convergent  to  the  wrong 
solution  of  (=S)JAX,  and  it  could  happen  that  there  is  no  set  satis¬ 
fying  ACSCB  which  is  valid  at  JAU,  so  that  Step  3  cannot  be 
successfully  completed. 

It  is  evident  that  e  and  Ac  must  be  selected  by  trial  and 
error.  A  more  powerful  approach  would  be  to  modify  e  and  At  adaptively 
as  the  computations  proceed:  one  would  provide  for  monitoring  the  n-umber 
of  iterations  used  by  Newton' s  method  each  time  it  is  employed  and  also 
the  number  of  candidate  sets  at  Step  3,  and  the  basic  strategy  would 
be  to  increase  At  and/or  decrease  e  when  the  algorithm  is  making 


good  progress  and  to  decrease  ^  and/or  increase  e  when  the  algorithm 
encounters  difficulty.  Such  an  approach  was  applied  successfully  in  the 
design  of  the  machine  code  used  to  solve  the  parametric  problem  of 
Chapter  IV. 

In  addition  to  the  possibility  of  increasing  computational  effi¬ 
ciency  by  adaptive  selection  of  €  and  it  is  possible  to  greatly 

improve  computational  efficiency  by  using  refinement,  bordering,  and 
partitioning  methods  for  the  inverse  matrix  required  by  Newton's  method. 

A  discussion  of  some  of  these  devices  is  given  in  Appendix  C.  These 
{5.0vices,  or  others  like  them,  should  be  incorporated  into  any  machine 
code  for  implementing  the  present  algorithm  ,  or  the  number  of  matrix 
inversions  required  would  probably  preclude  the  use  of  Newton's  method. 

k.  Further  Study  of  Step  3 

Step  5  of  the  Basic  Conceptual  Algorithm  involves  a  certain  amount 
of  trial  and  error:  at  the  point  of  change  a',  try  different  sets 
S  which  are  valid  at  a'  (i.e.,  Aa'  CS  QBa’)  until  one  is  found 
which  is  valid  to  the  right  of  a'.  When  m' -Pa'  is  a  singleton,  then 
no  erroneous  trials  will  be  made  at  Step  3?  for  there  are  only  two 
eligible  sets,  one  of  which  was  found  at  Step  2  not  to  be  valid  to  the 
right  of  a'.  When  IBa' -Pa'  contains  many  constraint  indices,  however, 
many  unsuccessful  trials  may  have  to  be  made  before  a  set  which  is  valid 
to  the  right  of  a'  is  found.  It  is  therefore  of  interest  to  appraise 
how  serious  a  difficulty  the  trial  and  error  nature  of  Step  3  is  likely 
to  be,  and  to  consider  some  ways  of  ameliorating  this  potential  stumbling 
block. 
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which  may  be 


It  is  possible  to  argue  heuristically  that  BCt'-AO!', 

referred  to  as  the  set  of  degenerate  constraints  at  a' ,  will  ordinarily 

consist  of  only  one  constraint.  Let  e  [0,1]  be  fixed,  and  assume 

that  Conditions  1  through  4  hold.  From  the  sufficiency  of  the  Kuhn- 

Tucker  Theorem,  it  follows  that  x*(a  )  also  is  the  optimim  solution 

—  o 

to  the  problem 

Maximize  f (x;a  )  subject  to  g. (x)  >0,  Vie  . 

X 

In  other  words,  all  constraints  except  those  of  Aa^  are  redundant. 

The  fact  that  some  of  them,  namely  those  of  happen  to  be 

exactly  satisfied  at  x*(0!q)  can  be  viewed  as  an  "accident."  It  seems 

more  likely  that  a  redundant  constraint  will  be  slack  at  ^  S'S 

those  of  M-Ba  are.  If  a  is  not  a  point  of  change,  we  conclude 
o  o 

that  Ba  -Aa  is  likely  to  be  empty  (Ba  -Aa  =  0  Implies  that  there 

O  O  0  0 

is  exactly  one  valid  set  at  a  ) .  The  set  BU  -Aa  is  sure  to  contain 

o  o  o 

at  least  one  constraint,  however,  when  is  a  point  of  change,  for 

as  a  traverses  the  unit  interval  continuity  dictates  that  the  only 
way  a  constraint  can  make  the  transition  from  slack  to  active  or  con¬ 
versely  is  to  pass  through  Ba-Aa.  Unless  there  is  strong  interdependence 
between  different  constraints,  not  more  than  one  or  two  constraints  are 
likely  to  be  involved  in  such  a  transition  at  any  given  point  of  change. 

Remark;  The  last  observation  bringsup  an  interesting  point  regarding 

the  testing  of  new  mathematical  programming  algorithms.  Often 
a  new  algorithm  is  applied  to  a  number  of  problems  whose  data 
were  generated  "randomly"  in  an  effort  to  gain  computational 
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experience  quickly  and  to  Judge  the  efficiency  of  the  algorithm. 
In  our  case  this  procedure  would  very  likely  lead  to  results 
biased  in  favor  of  our  algorithm.  The  reason,  of  course,  is 
that  "interdependence"  between  constraints  is  less  likely  to 
occur  when  problem  data  are  generated  randomly  than  when  problem 
data  derive  from  real  applications;  the  result  is  that  Step  5 
will  rarely  require  any  erroneous  trials  for  problems  with 
randomized  data. 

The  above  heuristic  argument,  although  somewhat  comforting,  does 
not  preclude  the  possibility  of  m' -Aa'  being  quite  numerous  (by 
Condition  4,  Ba:  can  be  composed  of  at  most  n  constraint  indices, 
and  so  BCC'-AQ!'  could  have  up  to  n  constraints).  Faced  with  this 
possibility,  one  may  follow  two  main  courses  of  inquiry.  One  may 
attempt  to  construct  methods  of  perturbing  (Po:)  so  as  to  ensure  that 
BCC-Aq;  consists  of  only  one  or  two  constraints  at  each  point  of  change 
(see  Markowitz,  1956,  p.  125,  and  Zahl,  1964,  p.  156).  Alternatively, 
one  may  attempt  to  devise  rules  for  deciding  in  what  order  the  trials 
should  be  made  at  Step  5  (the  Basic  Conceptual  Algorithm  is  ambiguous 
in  this  respect)  so  as  to  tend  to  keep  the  number  of  erroneous  trials 
small.  We  choose  to  follow  the  second  course  of  inquiry,  because 
(a)  this  type  of  investigation  is  conspicuously  lacking  at  present 
(for  a  notable  exception  in  the  context  of  a  related  problem  see  Theil 
and  Van  de  Panne,  i960) ,  and  (b)  the  second  course  of  inquiry  must  be 
undertaken  before  the  need  for  perturbation  can  be  established. 
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4 . 1  Preliminary  Remarks  on  Determining  the  Order  of  Trials  at  Step  3 


o 

We  begin  by  establishing  some  terminology.  Suppose  that  Step  2 
has  ended  with  the  point  of  change  a.'  <  1.  Let  a'+  be  a  point  between 
a,'  and  the  next  largest  point  of  change.  If  S  is  valid  at  a'  but 
not  at  a,'+,  the  unique  continuous  solution  of  (=S)o;  satisfying  the 
left  end-point  value  (x*(a-),  u*(a'))  violates  either  (KT-3)  or 
(KT-4),  or  possibly  both,  as  a  increases  above  a'.  In  other  words, 

S  "causes  an  alarm"  as  a  increases-^  above  a'.  A  violation  of 
(KT-3)  is  called  a  feasibility  alarm,  while  a  violation  of  (KT-4)  is 
called  an  optimality  alarm.  By  continuity,  the  set  of  feasibility  alarms 
must  be  contained  in  Ba’-S,  and  the  set  of  optimality  alarms  must  be 
contained  in  the  set  S-Aa' j  hence  all  alarms  are  from  Ba'-Aa’.  Since 
S  is  not  valid  at  a'+,  by  Corollary  1.2  either  iS-m' +}  5^  0  or 

(AQi'+  -  s}  0.  The  set  S-BC<:’+  will  be  called  the  excess  of  S  at 

a'+,  and  Aa'+  -  S  will  be  called  the  deficiency  of  S  at  a’+. 

Clearly  the  smallest  change  in  S  which  will  result  in  a  set  which 
is  valid  at  a'+  is  to  delete  its  excess  and  add  its  deficiency.  The 
nmber  of  constraint  indices  of  {Aa'+  -  S)  U  (S-ECt'  +  ]  is  therefore 
a  measure  of  the  minimum  distance,—'^  which  we  denote  by  d(S)  ,  between 
S  and  the  collection  of  all  sets  which  are  valid  at  a’+. 

17  Since  x^(a)  and  u®(a)  are  analytic  functions,  there  is  an  e  >  0 

such  that  each  component  of  (g(x^ (a) ,u® (a) )  has  constant  sign  on 
(Q!',Q!'+€).  It  is  in  this  sense  that  we  define  the  alarms  caused  by 
S  "as  a  increases  above  a'." 

— ^  The  distance  between  a  set  C  and  a  set  D,  where  C  and  D  are 
both  subsets  of  M,  can  be  defined  as  the  number  of  elements  in  the  set 
{C-D]  U  (D-C).  It  is  readily  verified  that  this  definition  meets  all  of 
the  usual  requirements  of  a  distance  metric  and  hence  makes  a  metric 
space  out  of  the  set  of  all  subsets  of  M. 
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Figure  2  is  designed  to  help  the  reader  visualize  the  various 
sets  mentioned  above  for  a  hypothetical  case,  and  it  will  be  convenient 
to  refer  to  it  occasionally  during  the  rest  of  this  section.  Each  dot 
represents  a  constraint- -fifteen  in  all.  The  constraints  in  S  are 
circled  to  distinguish  them  from  the  others.  Constraints  6,  8,  and 
10  are  labelled  "g"  to  signify  that  they  are  potential  feasibility 
alarms  (Ba:'-S),  and  constraints  7,  9,  and  11  are  labelled  "u"  to 
signify  that  they  are  potential  optimality  alarms  (S-ACC')-  The 
deficiency  of  S  at  a'+  is  precisely  constraint  6,  and  the  excess 
is  constraint  11. 

Can  one  guess,  by  observing  which  feasibility  and  optimality 
alarms  S  causes  as  a  increases  above  a’,  what  changes  can  be  made 
in  S  in  order  for  it  to  be  valid  at  a’+?  It  is  tempting  to  con¬ 
jecture  that  any  constraint  (in  S)  which  yields  an  optimality  alarm 
should  be  deleted  from  S,  for  it  is  well-known  (e.g.,  see  Wilde,  I962) 
that  a  dual  variable  may  be  interpreted  as  giving  the  marginal  decrease 
of  the  value  of  the  objective  function  with  respect  to  an  increase  in 
the  "right-hand  side"  of  the  corresponding  constraint.  Similarly,  it 
is  tempting  to  conjecture  that  any  constraint  (not  in  S)  which  yields 
a  feasibility  alarm  should  be  added  to  S  in  order  that  it  remain 
satisfied  as  a  increases  above  a' .  If  this  line  of  reasoning  were 
correct,  then  by  deleting  the  constraints  which  yield  optimality  alarms 
and  adding  those  which  yield  feasibility  alarms,  one  could  obtain  from 
S  a  set  which  is  valid  at  a'+;  for  the  optimality  alarms  would 
coincide  with  the  excess  of  S  and  the  feasibility  alarms  would 
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coincide  with  the  deficiency  of  S.  Unfortunately  this  is  not  the  case, 
because  the  interactions  between  constraints  which  are  degenerate  at 
ex'  have  been  ignored.  It  is  therefore  possible  to  construct  simple 
examples  (see  Appendix  B)  for  which  there  are  false  and  silent  alarms. 

By  a  false  alarm  we  mean  a  feasibility  alarm  which  is  not  from  the 
deficiency  of  S  at  a'+  and  not  from  the  set  of  degenerate  constraints 
at  a'+,  or  an  optimality  alarm  which  is  not  from  the  excess  of  S 
at  a'+  and  not  from  the  set  of  degenerate  constraints  at  a'+.  By 
a.  silent  feasibility  alarm  we  mean  the  absence  of  a  feasibility  alarm 
from  a  constraint  in  the  deficiency  of  S  at  0£'+,  and  by  a  silent 
optimality  alarm  we  refer  to  the  absence  of  an  optimality  alarm  from  a 
constraint  in  the  excess  of  S  at  a'+.  In  terms  of  Figure  2,  a 
false  feasibility  alarm  would  be  an  alarm  from  constraint  number  10,  a 
false  optimality  alarm  would  be  an  alarm  from  Y,  a  silent  feasibility 
alarm  would  be  the  absence  of  an  alarm  from  6,  and  a  silent  optimality 
alarm  would  be  the  absence  of  an  alarm  from  11.  Note  that  the  alarms 
from  the  set  of  constraints  which  are  degenerate  at  a'  +  (BQ:'+  -  Aa'+), 
if  any,  are  immaterial— for  the  presence  or  absence  of  these  constraints 
(numbers  8  and  9  in  Figure  2)  for  a  trial  set  does  not  affect  its 
validity  at  a'+. 

The  above  remarks  indicate  that  not  very  much  information  about 
what  constitutes  a  valid  set  at  a'+  can  be  gleaned  from  a  trial  which 
fails  at  Step  5-  Evidently  the  statement  of  Corollary  1.1  that 
Pa'  CL  Aa'  +  c:  Ba’+ ClBa'  is  about  as  strong  a  statement  as  can  be  made. 
As  has  already  been  pointed  out,  this  is  already  a  very  strong  statement 
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in  the  likely  event  that  there  are  only  a  few  degenerate  constraints  at 
a' .  Yet  when  a  trial  set  fails  at  Step  3  there  is  one  clue  to  the 
identity  of  a  set  which  is  valid  at  a'+  that  can  be  salvaged:  at 
least  one  of  the  alarms  given  during  a  failure  is  from  the  deficiency 
or  excess  at  a’+  of  the  trial  set.  In  the  next  subsection  we  shall 
prove  this  fact.  The  result  will  then  be  used  to  devise  an  ordering 
of  trials  at  Step  3- 

4 . 2  Sharpening  Corollary  2.1 

Lemma  6.1: 

Let  a'  €  [0,1]  be  a  point  of  change,  let  S  be  valid  at  a', 
and  assinne  that  Conditions  1  through  4  hold. 

Then  there  exists  a  convex  set  X'  IDX  and  an  open  interval 
containing  and  symmetric  about  0!'  and  contained  in  10! '  such 
that,  for  each  fixed  value  of  a  in  this  interval,  x  (a)  is 
the  optimal  solution  of 

Maximize  f(x;0!) 

X  €  X* 

subject  to  =0;  V  i  £  {S  -  S  a] 

g^(x)  >0,  Vie  S'^a  , 

where  s'^a  0(i  e  S:  u^(a)  >  O). 

Proof:  Arguing  as  in  Proposition  3  and  using  the  continuity  of 

u^(a)  and  the  fact  that  u®{a')  =  u*(a')  >0,  one  obtains  (here  we  employ 
the  notations  of  Proposition  3)  that  Max  S  ('i'  Yy  8j  (x) ) ) 

P  X  ^  1  X 
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is  negative  on  X  X  a'  and  continuous  on  some  open  region  containing 
this  direct  product  set.  %  the  compactness  and  convexity  of  X  X  a', 
it  follows  that  the  hessian  of  the  Lagrangian  function  f(xja)  + 


m 


^  u.(q:)  g.(x)  is  negative  definite  on  some  open  convex  region 


X'  X  I'a’  containing  X  X  a' .  In  view  of  A. 3,  the  Lagrangian  function 
must  be  strictly  concave  with  respect  to  x  open  convex  set 

X'  for  each  fixed  value  of  a  e  I'a'. 

Now  x®(a')  =  x*(a')  e  XCIX',  X'  open;  since  x^(a)  is  con¬ 
tinuous  on  Ta',  one  obtains  that  x®(a)  e  X'  for  all  a  sufficiently 
near  a'.  Since  the  gradient  with  respect  to  x  of  the  Lagrangian 

Q  g 

function  vanishes  at  x  (a) ,  we  conclude  by  A. 6  that  x  (a)  is  the 
global  maximum  of  that  function  on  the  convex  set  X'  for  any  fixed 
a  sufficiently  near  a'.  Using  the  fact  that  u?(a)  =0,  V  i  ^  S, 
and  g^(x®(a))  =0,  V  i  €  S,  one  obtains,  for  any  fixed  a  suffi¬ 
ciently  near  a' ,  that 


in  q 

(14)  f(x^(a);  a)  >  f(x;a)  u^(a)  g^(x),  v  x  e  X'  . 

m  „  + 

Since  T  u^(a)  g.(x)  >  0  for  all  x  such  that  g^(x)  =0,  Vie  {S-S  a), 
1  1  1 

+  s 

and  g^(x)  >0,  V  i  e  S  a,  where  S.  a  C(i  e  S:  u^(a)  >  O],  the 
conclusion  of  the  lemma  follows  from  (l4) . 


Remark:  An  easy  proof  of  this  lemma  can  be  constructed  from  the  Kuhn- 

Tucker  Theorem  when  all  constraints  are  linear;  in  this  case 
X'  may  be  taken  to  be  E^.  When  all  constraints  are  linear, 
specialization  of  the  Kuhn- Tucker  Theorem  reveals  that  (=S)a 
are  necessary  and  sufficient  conditions  for  a  maximum  of  f (x;a) 
subject  to  g^(x)  =0,  V  i  e  S. 
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Remark:  The  region  X'  may  be  taken  to  be  contained  in  the  open  region 
mentioned  in  Condition  1. 

Theorem  6  ; 

Let  a'  e  [0,1]  be  a  point  of  change,  let  S  be  valid  at  a' , 
and  assume  that  Conditions  1  through  4  hold. 

Then  there  exists  an  open  interval  containing  and  symmetric  about 
a'  and  contained  in  la’  such  that,  for  each  fixed  value  of  a 
in  this  interval,  the  following  three  assertions  are  equivalent: 

(l)  S  is  valid  at  a. 

(li)  (/(a) ,  /(a) )  =  (x*(a) ,  u*(a) ) . 

(lii)  g^(x^(a))  >0,  Vie  [Aa-S] 

U®(a)  >0,  Vie  Cs-Ba}. 

Proof:  The  equivalence  of  (i)  and  (ii)  and  the  fact  that  (11) 
implies  (ill)  are  known  from  Corollary  2.1.  To  complete  the  proof  of 
the  theorem,  it  is  sufficient  to  show  that  (ill)  implies  (11)  on  the 
interval  mentioned  in  Lemma  6.1. 

Assume  that  (iii)  holds  for  some  fixed  value  of  a  in  the  Interval 
mentioned  in  Lemma  6.1.  Using  the  assumption  that  u^(a)  ^  0, 

Vie  (S-Ba),  and  applying  Lemma  6.1  with  S  a  =  [S-Ba],  one  may 
assert  the  existence  of  a  convex  set  X'OX  such  that  x  (a)  is  an 
optimal  solution  of 

(15)  Maximize  f(x;a)  subject  to  g^(x)  =0,  Vie  [Eans] 

X  G  X* 

g^(x)  >0,  Vie  (S-Ba)  . 
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Using  the  assumption  that  g^(x^(a))  >0,  Vie  [Aa:-S}„  we  have 

g 

that  X  (a)  is  feasible  in 

Maximize  f(x;a) 

X  e  X' 

subject  to  g^(x)  =  0,  Vie  (Bans} 

g^(x)  >  0,  Vie  (S-Ba]  U  (Aa-S}  . 

Since  the  feasible  region  of  (l6)  is  included  in  that  of  (l5), 

s 

X  (a)  must  be  an  o;^timal  solution  of  (l6)  . 

It  follows  from  A. 4  and  A. 6  and  the  fact  that  (x*(a) ^  u*(a) ) 
satisfies  (=Aa)a  that  x*(a)  is  optimal  in 

(1?)  Maximize  f(x;a)  subject  to  g.  (x)  >  0,  V  i  e  Aa  . 

X  e  X'  ^ 

Since  the  feasible  region  of  (l6)  is  included  in  that  of  (I7),  and 
since  x*(a)  is  feasible  in  (16) ,  x*(a)  must  be  optimal  in  (16) . 

That  is>  both  x*(0!)  and  ^  (oc)  are  optimal  in  (16);  thus 

s  ^ 

f  (x*(a)  ;a)  =  f(x  (a)}a).  Because  x  (a)  is  feasible  in  (17)>  therefore 
we  finally  have  that  x^(a)  is  optimal  in  (17) .  Since  (17)  must  have 
a  unique  optimal  solution  by  A.2,  x^(a)  =  x*(d:)  .  '  This  implies,  by 

Condition  k,  that  u^(a)  =  u*(a) .  Thus  (ii)  holds. 

The  significance  of  this  sharpening  of  Corollary  2.1  is  that 
it  rules  out  the  ^Possibility  that  all  alarms  are  either  false  or  from 
the  set  of  degenerate  constraints  at  a' +  when  S  is  not  valid  at 
a  +.  That  is,  at  least  one  alarm  is  from  the  deficiency  or  excess 
of  S  at  a'+. 
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4.3  Modification  of  Step  3— Determining  the  Order  of  Trials 

Suppose  that  Step  2  has  ended  with  the  point  of  change  a'  <  1. 
Designate  the  set  of  alarms  which  are  given  by  S°  (the  set  used 
during  Step  2)  as  a  increases  above  a'  by  T.  Applying  Theorem  6 
at  a' ,  we  know  that  at  least  one  of  the  alarms  is  from  the  excess 
or  deficiency  of  S  at  0!*+.  Unfortunately^  we  do  not  know  which 
one.  A  logical  way  of  proceeding  at  Step  3  is  to  modify  S  by  one 
constraint  at  a  time  for  each  constraint  in  T,  i.e.,  try  the  sets 
S°  +  i  for  each  i  e  T,  where  the  symbol  S°  +  i  means  S  U  i 
if  i  S°  and  S°-i  if  i  e  S°.  This  notation  is  designed  to 

avoid  having  to  distinguish  between  feasibility  and  optimality  alarms. 

In  other  words,  add  the  constraints  which  were  feasibility  alarms  to 
S°  and  delete  constraints  which  were  optimality  alarms  from  S°  one 
at  a  time  until  each  alarm  has  been  heeded  individually.  Note  that 
3°  +  i  e  T,  is  valid  at  a'  since  all  alarms  caused  by  a  set 
which  is  valid  at  a’  must  be  from  Ea'-Aa'.  Hence  S°  +  i,  i  e  T, 
satisfies  (8. l) . 

When  T  has  been  exhausted  by  this  first  generation  of  trials, 
at  least  one  trial  set,  say  S°  +  i^,  is  one  unit  of  distance  closer 
to  a  valid  set  at  a'+.  If  d(S°)  =  1  then  S°  +  i^  is  valid  at 
a'+  and  Step  5  has  been  successfully  completed.  If  d(S°)  >  1  then 
d(s°  +  i  )  =  d(S°)-l  >  0,  and  a  second  generation  of  trials  is 
necessary.  At  each  first  generation  trial,  let  T^  denote  the 
alarms  due  to  S°  +  i,  i  e  T.  At  the  second  generation  one  should 
try  S°  +  i  +  J  for  all  i  e  T  and  all  3  g  T^.  The  symbol  S°  +  i  + 


means 


CS°  +  i}U  j  if  and  {S°  +  i}  -  j  if  jeS°+i.  Applying 

Theorem  6  at  a’  with  S  =  S°  +  1^,  we  see  that  at  least  one  of 

the  alarms  due  to  S°  +  i  is  from  the  excess  or  deficiency  of 

—  o 

+  1  at  0!'+,  but  we  do  not  know  which  one.  Hence  at  least  one 
—  o 

of  the  sets  +  i  +  ij,  j  e  T.  ,  is  one  unit  of  distance  closer  to 

o  Q 

a  set  which  is  valid  at  a'+.  Designate  one  such  set  hy  S  +  i 
If  d(S°)  =  2  then  S°  +  i  +  j  is  valid  at  a' + ,  and  Step  3  has 
been  successfully  completed.  If  d(S°)  >  2.,  then  d(S°  +  —  ^o^  ~ 

d(S°)-2  >  0,  and  a  third  generation  of  trials  is  necessary. 

The  bhird  generation  of  trials  is  constructed  in  a  manner  analogous 
to  the  preceding  generations,  and  so  on  for  the  higher  order  generations. 
If  at  any  trial  a  set  is  encountered  which  has  been  tried  before,  it 
may,  of  course,  be  discarded. 

At  each  generation  the  distance  from  some  trial  set,  and  perhaps 
several,  to  the  collection  of  all  sets  which  are  valid  at  a'+  is 
decreased  by  one  unit.  Since  d(S  )  is  finite  (in  fact  it  is  bounded 
by  the  number  of  constraints  in  B3'-Ao!'  minus  the  number  of  constraints 
in  33;'+  -  Aq:'+),  after  a  finite  number  of  generations  of  trials  a 
set  which  is  valid  at  a'+  will  be  obtained- -after  exactly  d(S°) 
generations,  in  fact.  The  nearest  valid  set  is,  it  will  be  recalled, 

S°  plus  its  deficiency  at  a'+  minus  its  excess  at  «'+.  These  rules 
are  summarized  below. 

Order  of  Trials  at  Step  3 

1.  Let  T  denote  the  alarms  which  are  given  by  S°  as  a 
increases  above  a'.  At  the  first  generation  of  trials. 
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try  S°  +  i  for  each  i  €  T.  Let  denote  the  set  of 

alarms  which  are  given  by  S  ieT^  as  Q!  increases 

above  a'.  If  T.  =  0  for  some  i*  e  T,  then  S°  +  i* 

1  ~ 

is  valid  at  a'+,  and  Step  3  has  been  completed;  otherwise, 
go  on  to  the  second  generation  of  trials. 

2.  At  the  second  generation  of  trials,  try  S°  +  i  +  J  for 

each  i  e  T  and  all  j  e  T. .  Let  T.  be  the  set  of  alarms 

1  IJ 

which  are  given  by  +  i  +  j ,  i  e  T  and  j  e  T^,  as  CC 

increases  above  Q!'+.  If  T.  .  =  0  for  some  i*  e  T  and 

i*  €  T  then  S°  +  i*  +  J*  is  valid  at  a'+,  and  Step  3 

has  been  completed;  otherwise,  go  on  to  a  third  generation 
of  trials. 

Etc.  (Omit 'any  sets  which  have  been  tried  previously.) 

Since  the  only  modification  of  Step  3  being  suggested  here  is  a 
more  complete  specification  of  the  order  in  which  the  trial  sets  are 
to  be  considered,  and  since  this  order  has  been  shown  to  lead  to  a 
successful  completion  of  Step  3j  the  assertions  of  the  Basic  Theorem 
still  apply  to  the  Basic  Conceptual  Algorithm  with  Step  3  modified  as 
above. 

If  these  rules  are  to  be  incorporated  into  the  Basic  Computational 
Algorithm,  then  in  order  to  ensure  that  Theorem  6 -and  hence  the  above 
rules— applies,  it  is  necessary  to  take  A3!  less  than  one-half  the 
length  of  the  smallest  of  the  intervals  of  Theorem  6  applied  at  each 


point  of  change. 


We  do  not  hold  that  the  order  of  trials  suggested  here  is  the  most 
efficient  order  which  can*  be  devised.  However^  the  following  advantages 
are  to  be  noted: 

(1)  Each  unsuccessful  trial  helps  to  determine  the  order  of 
successive  trials. 

(2)  The  suggested  order  of  trials  always  leads  to  the  (unique) 
valid  set  nearest  S°. 

(3)  A  valid  set  is  found  after  exactly  d(S°)  generations  of 
trials.  In  this  sense  search  termination  is  predictable, 
although  not  a  priori  so. 

(4)  s°  is  deformed  one  constraint  at  a  time  from  trial  to 
trial,  so  that  the  computational  machinery  is  upset  the 
least  amount  possible. 

5 .  Some  Extensions 

5.1  Linear  Equality  Constraints 

Let  the  constraints  of  (Fa)  include  some  linear  equality 
constraints.  It  is  clear  that  if  each  such  constraint  is  wri'iten  as 
a  pair  of  inequality  constraints  (i.e.,  if  the  pair  gj^(x)  2) 

-g^(x)  >0  is  written  in  place  of  g^(x)  =  O) ,  then  Condition  4  never 
holds.  Fortunately,  it  can  be  shown  that  a  simple  modification  of 
the  Basic  Conceptual  Algorithm  obviates  this  difficulty:  always  Include 
the  linear  equality  constraints  in  S°  at  Step  2  and  in  the  trial 
sets  at  Step  3  (ignore  any  optimality  alarms  that  such  constraints  may 
give).  If  all  of  the  constraints  happen  to  be  linear  equalities,  in 
fact.  Step  3  would  disappear  entirely. 


5.2  More  General  Parametric  Problems 

With  appropriate  modifications  of  the  four  conditions,  it  can  be 
shown  (Geoffrion,  I965)  that  many  of  the  results  of  this  chapter  apply 
to  any  one -dimensional  perturbation  of 

(pp)  Maximize  f(x,p)  subject  to  ^  9.  ’ 

~  X 

where  the  parameter  p  =  (p^,  •  •  •  varies  over  a  convex  set  P  in 

f(x,p)  is  continuous  in  i^>v)  strictly  concave  in  x  for 

each  p  e  P,  and  g^(x,p)  (i=l,...,m)  is  concave  in  (x,p) .  By 

a  one-dimensional  perturbation  of  (Pp)  we  mean  a  parametric  problem 
of  the  form 

Maximize  f(x,p'  +  a(p"  -  p’)) 

X 

subject  to  g(x>P'  +  0S(P”  -  P’))  ^2 

for  each  value  of  Ot  e  [0,1],  where  P' >  P  £  P- 

It  is  evident  that  (Pp)  is  general  enough  to  include  many  of 
the  parametric  problems  of  Interest  to  those  who  wish  to  perform 
sensitivity  analysis  on  concave  programming  problems. 


CHAPTER  IV 

An  Illustrative  Example 

A  simple  model  of  a  firm  will  He  used  to  illustrate  the  manipu¬ 
lation  and  solution  of  a  decision  problem  under  uncertainty  by  means 
of  the  techniques  presented  in  the  preceding  three  chapters. 

1.  A  Decision  Problem  Under  Uncertainty 

Consider  a  hypothetical  firm  which  produces  and  sells  n  products 
in  an  imperfectly  competitive  market.  Assume  that  the  cost  of  producing 
and  selling  each  unit  of  product  i  is  c^  dollars  per  unit,  and 
that  the  total  dollar  revenue  accruing  from  the  sale  of  units  of 

product  i  is  +  (d^/k^)  ^n(k.x^+l) ,  where 

a.,  d. ,  k.  are  positive  scalars,  .^n(')  denotes  the  natural  log, 
and  p  is  a  price  index.  The  interpretation  of  r^Cx^)  becomes  clearer 
if  one  examines  dr^(x^)/dx^  =  a^+b^P-d^+d^/(k^x^+l) .  Since 
dr.(0)/dx.  =  a.+b.p  and  dr.(«')/dx.  =  a.+b  P-d.,  we  see  that  price 
gradually  decreases  from  a^+b^P  (notice  the  linear  dependence  on 
the  price  index)  to  a^+b^P-d^  dollars  per  unit  as  production  increases 
without  bound.  The  value  of  k^  determines  the  rapidity  of  the  price 
decrease,  and  it  is  easily  shown  that  a  proportion  0  <  t  <  1  of  the 
total  possible  price  decrease  d^  is  achieved  at  x^  =  t/(l-t)k^. 

If  we  denote  the  (short-run)  resource  and  other  constraints 
(including  x  >  O)  by  g(x)  >  0,  then  assuming  that  the  firm  can 
sell  all  it  produces  the  profit  maximization  problem  is 
n 

Maximize  2/  ^  +  l)  - 

(1)  - 

subject  to  g(x)  ^  0  • 
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We  shall  assume  that  all  functions  and  coefficients  are  known  except 
the  price  index  p,  which  will  be  regarded  as  a  random  variable  with 
a  known  cumulative  distribution  function  0(p). 


2.  Circumventing  Uncertainty  by  a  Vector  Maximum  Reformulation 

In  order  to  circumvent  the  uncertainty  attending  the  objective 
function  of  (l),  we  elect  to  employ  one  of  the  approaches  considered 
at  some  length  in  Chapter  I:  a  vector  maximum  reformulation  using 
the  expected  value  criterion  and  the  maximum  .05-fractile  criterion 
(some  fractile  other  than  the  .05-fractile  could  be  used  if  desired). 
Assume  that  ^(P)  is  continuous,  strictly  increasing  on  the  entire 
real  line,  and  that  its  mean  is  zero  (if  the  mean  is  not  zero,  it 
can  be  incorporated  into  the  a^).  One  derives  that  the  mean  and 
.05-fractile  of  the  objective  function  for  fixed  x  are,  respectively, 
n 


fl(x) 

fgCx) 


7)  (a.  -  d.  -  c.)x.  +  (d./k. ) ^n(k.x  +  l) 

1  11  ^I'l  ^11 

1=1 

(f  (x)  +  $'^(.05)  E  D  ^i^i  -  ° 

i=l 


f^(x) 


+  $  (.95)  E 


i=l 


b.x. 

1  1 


if  L 


b.x.  < 
11  — 


In  place  of  (l)  we  consider  the  vector  maximum  problem 


(2) 


"Maximize"  f^(x) ,  ^2^— ^ 

X 

subject  to  g(x)  >  0  . 


The  efficient  outcomes  of  (2)  are  to  be  computed  and  plotted  (as  in 
Figure  5  below)  so  as  to  present  a  "tradeoff  curve"  between  the  two 
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criteria.  A  decision-maicer  then  subjectively  determines  a  point  on 
the  tradeoff  curve,  and  implements  the  corresponding  optimal  production 

schedule. 

An  Equivalent  Parametric  Programming  Reformulation 

2 

The  hessian  of  diagonal  matrix,  with  -k^d^/(k^x^+l) 

on  the  diagonal.  When  ^  >  0,  the  assumed  positivity  of  k^  and 
d.  implies  that  this  hessian  is  negative  definite.  By  A.3>  therefore, 
f,  (x)  is  seen  to  be  strictly  concave  on  the  non- negative  orthant. 

An  enumeration  of  cases  shows  that  also  strictly  concave 

on  the  non-negative  orthant  when  $  ^(.05)  <  0  and  $  (-95)  >  0. 

In  view  of  our  assumption  that  the  mean  is  0,  it  is  reasonable  to 
assume  that  this  last  condition  holds.  Assuming  further  that  each 
constraint  function  is  concave,  we  conclude  that  Proposition  6  of 
Chapter  II  applies. Hence  to  find  all  efficient  solutions  of  (2) 
it  is  eQ^uivalent  to  find  the  optimal  solutions  of 

Maximize  (l-Q:)f^(x)  +  afg(x) 

(3)  - 

subject  to  g(x)  >  0 

for  each  value  of  (X  in  the  unit  interval. 

Consider  (3)  with  a  fixed.  The  presence  of  the  logical  con¬ 
dition  in  the  definition  of  fg  makes  the  solution  of  (5)  somewhat 

i/  It  is  easy  to  see  that  Proposition  6  still  holds  if  the  f^  are 
assumed  to  be  strictly  concave  on  X,  and  not  necessarily  on  E  . 


Ill 
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awkward.  One  approach  is  to  solve  the  pair  of  problems 


Maximize  (l-a)f^(x)  +  a[f^(x)  +  $  ^(-05)  ^  ^i^i ^ 

X 

(4)  subject  to  g(x)  >  0 

y.  b.x.  >  0 
1^  11  — 

Maxhnize  (l-a)f^(x)  +  a[f^(x)  +  ^ 

X 

(5)  subject  to  g(x)  >  0 

J)  b.x.  <  0  . 

The  optimal  value  of  (5)  equals  the  larger  of  the  optimal  values 
of  (4)  and  (5),  since  the  feasible  regions  of  (4)  and  (5)  are  merely 
a  dichotomy  of  that  of  (5) .  ¥e  shall  avoid  this  complication^  however, 
by  requiring  of  our  numerical  example  that  b^  >  0  (i  =  1, . . . ,n) ; 
since  x  >  0,  this  condition  implies  that  ^  therefore 

(3)  may  be  rewritten  as 

Maximize  (l-a)f^(x)  +  a[f^(x)  +  $  (-05) 

(6)  ^ 

subject  to  g(x)  ^  • 

4 .  .  Solving  the  Parametric  Problem 

We  shall  solve  a  numerical  example  based  on  (6)  with  n  =  4 
and  m  =  7-  Table  1  gives  the  numerical  data  for  the  objective 
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function,  and  the  constraints—'  are: 

X.  >  0,  i  =  1,  . . .  ,  4 
1  — 


- . Olx^  - . OlXg 

- .  04x^ 

3 

-.04x,  +  2  >  0 

+  x^  -.4  Xg 

- .  1  x^ 

3 

-.1  Xj^  +  20  >  0 

2  2 
- . Olx^  - . OlXg 

2 

-.Olx^ 

-.Olx^  +  15  >  0 

i  =  1 

i  =  2 

1  =  3 

i  =  4 

a. 

1 

10.0 

12.0 

10.5 

11.0 

h. 

1 

0.0634 

0.0950 

0.6740 

0.7540 

8.0 

10.0 

8.5 

9.0 

2.50 

2.55 

2.20 

2.  25 

0.12 

0.13 

0.o45 

0.050 

3/ 

Table 


It  is  further  assumed  that  p  is  normally  distributed  with  zero 
mean  and  unit  variance.  Hence  $  ^(.05)  =  -1.64. 

It  is  clear,  since  (i  =  that  f^,  fg 

and  g.  (i  =  1,...,7)  are  analytic  on  some  open  region  containing  the 
non-negative  orthant.  Because  the  constraints  are  concave,  therefore, 

w  Each  X.  represents  hundreds  of  units  of  product  i.  The  last 
three  constraints  are  to  be  interpreted  as  constraints  on  three 
resources,  which  we  refer  to  as  resources  A,  B,  and  C  respectively. 
Resources  are  measured  in  thousands  of  units. 

— /  The  units  of  the  coefficients  are  such  that  f^  and  fg  are  in 
thousands  of  dollars. 
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satisfied.  Since  resources  are  limited 


Condition  1  of  Chapter  III  is 
in  all  real  problems  of  this  type,  Condition  2  is  not  restrictive, 
and  in  fact  holds  for  the  feasible  region  of  our  numerical  example. 

It  was  observed  above  that  the  hessian  of  f^  is  negative  definite 
on  the  non-negative  orthant,  and  the  same  is  true  for  fg,  so  that 
Condition  3  holds.  ¥e  shall  not  bother  to  verify  whether  Condition  4 

is  satisfied  by  our  numerical  example. 

A  version  of  the  Basic  Computational  Algorithm  for  solving  (6) 
was  coded  for  the  Burroughs  B5000  computer.  No  attempt  was  made  to 
optimize  program  efficiency  beyond  the  incorporation  of  a  simple 
variable  step  size  feature  (see  the  last  two  paragraphs  of  section  3; 
Chapter  III).  The  results  of  the  computation  are  presented  in 
Figures  1,  2,  and  3-  Figure  1  is  a  graph  of  the  optimal  production 
schedule,  xHa),  as  a  function  of  a.  Note  the  markers  at  the 
following  values  for  a,  each  of  which  is  a  point  of  change  marking 
an  execution  of  Step  3:  0.6024,  0.T819,  0.8558.  Since  no  false  or 

silent  alarms  are  encountered  at  any  of  these  points.  Step  3  is 
executed  in  each  case  with  no  erroneous  trials.  Figure  2  presents 
graphs  of  u|(a)  and  g.(x^(a))  (i=  5,6,7)-  Note  that  the  dual 
variables  (or  "shadow  prices")  u|(a)  (i  =  1,...,4)  are  not  graphed, 
since  they  are  identically  zero  on  [0,l],  and  that  it  is  not  necessary 
to  graph  the  non-negativity  constraints.  Figure  5  is  a  plot  of  the 
efficient  outcomes  associated  with  the  two  criterion  functions-a 
tradeoff  curve.  It  shows,  for  example,  that  production  plan  x*(0.80T) 
guarantees  a  profit  of  at  least  $52,700  with  probability  .95  and  an 
expected  profit  of  $79,100. 
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Some  Properties  of  Convex  Sets  and  Concave  Functions 

A  set  S  in  E^  is  said  to  be  convex  if  (\x' +(l-X)x")  e  S 
whenever  x'^x"  e  S  and  0  <  X  <  1. 

A  function  f(x)  which  is  defined  on  a  convex  set  S  is  said 
to  be  concave  if  f (Xx'+(l-X)x")  >  Xf(x' )+(l-X)f (x")  whenever 
x'  ,x"  e  S  and  0  <  X  <  1.  If  the  first  inequality  holds  strictly 
whenever  x'  0  <  X  <  1,  f(x)  is  said  to  be  strictly 

concave.  The  function  -f(x)  is  said  to  be  convex  or  strictly 
convex  according  as  f(x)  is  concave  or  strictly  concave.  When  the 
convex  set  S  is  not  specified  explicitly,  it  is  implicitly  taken 
to  be  the  entire  space. 

The  following  properties  of  convex  sets  and  concave  functions 
are  used  in  the  text.  The  proofs,  most  of  which  follow  easily  from 
the  definitions,  may  be  found  in  Fenchel  (1953)  or  Zoutendijk  (i960). 

A. 1  If  g^(x)  (i  =  1, . . . ,m)  are  concave  functions  on  E^, 

then  (x:  g^(x)  >0,  i  =  1, . . . ,m}  is  a  closed  and  convex 

set. 

A. 2  Any  local  maximum  of  a  concave  function  on  a  convex  set  is 
also  a  global  maximum  over  that  set;  a  strictly  concave 
function  can  have  at  most  one  local  maximum. 

A.  3  A  twice-differentiable  function  defined  on  a  convex  set  S 
is  concave  if  and  only  if  its  hessian  matrix  is  negative 


118 


semidefinite  at  each  x  e  S.  If  th^  hessian  is  negative 
definite  at  each  x  €  Sj  then  the  function  is  strictly 
concave  (the  converse  is  not  true  in  general^  but  does 
hold  when  the  function  is  a  quadratic  polynomial  and  S  =  E  )  . 

A. 4  If  f . (x)  (i  =  l,...,k)  are  concave  functions  on  a  convex 

set  S,  and  u.  >  0  (i  =  l,...,k),  at  least  one  u^  >  0; 
k  ^ 

then  y;  u.f.(x)  is  concave  on  S;  if  f . (x)  is  strictly 
Y  1  1  ^  “  k 

concave  for  some  i  such  that  u^  >  0^  then  ^  u^f^(x) 

is  strictly  concave. 

A. 5  A  concave  or  convex  function  on  a  convex  set  S  is  con¬ 
tinuous  at  every  relative  interior  point  of  S. 

A. 6  If  f(x)  is  differentiable  and  concave  on  a  convex  set  S 
and  V^f(x°)  =0,  x°  e  S,  then  f(x°)  >  f(x)  for  all 
X  e  S. 

A. 7  The  Theorem  of  the  Separating  Hyperplane  asserts  that  if 
S  and  T  are  two  convex  sets  in  with  no  interior 

point  in  common,  then  there  exist  an  n-vector  ^  ^  ®  and 
a  scalar  c  such  that  ^  —  ^  ^i^i 

s  €  S,  t  €  T  (see  Karlin,  1959,  P-  398  for  a  proof). 
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APPENDIX  B 


Graphical  Examples 

We  shall  illustrate  the  Basic  Conceptual  Algorithjn  by  considering 
three  examples  of  the  form 

n  2  ^  \2 

Maximize  a  V  -(x  -c!)  +  (l-a)  'ly  -(x.  -  c  ) 

4-^  11  -|  -t- 

/  n  \  X  1 

(B.l) 

subject  to  +  b^  >0;,  i  =  1^  .  .  .  ,  m  . 

The  first  example  is  well-behaved  in  the  sense  that  there  are  no 
false  or  silent  alarms  (see  section  4.1  for  definitions  of  false 
and  "silent"  alarms) >  whereas  in  the  second  and  third  examples  such 
troubles  do  occur. 

Problems  of  the  form  (B.l)  are  among  the  simplest  which  can  be 

subsumed  under  the  present  theory:  both  objective  functions  are 

quadratic  and  linearly  separable^  and  the  constraints  are  linear. 

The  fact  that  false  and  silent  alarms  can  occur  for  such  problems 

seems  to  render  unlikely  the  existence  of  a  special  class  of  (Pa) 

for  "wliicli  false  and  silent,  alarms  cannot  occur. 

The  examples  to  be  given  are  presented  and  analyzed  graphically 

rather  than  numerically  because  (B.l)  is  readily  amenable  to  graphical 

interpretation  when  n=  2  (the  case  considered  here) .  Let  a  be 

^  c  4 

fixed.  When  S  is  a  consistent  set^  i.e.,  when  Xg  -  t.x:  a^x^ 
b.  =  0,  V  i  e  S)  0,  it  follows  from  the  Kuhn- Tucker  Theorem  that 
(=S)a  is  necessary  and  sufficient  for  a  maximum  of  the  objective 
function  subject  to  x  e  Xg.  From  the  circularity  of  the  level 
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curves  of  this  particulax  objective  function  it  is  evident  that  this 
constrained  maximum  is  just  the  point  of  Xg  nearest  to  the  uncon¬ 
strained  maximum  ^(a)  =  oc'  +  (l-a)c. 

Each  figure  is  dravm  in  x- space  (n  =  2)  with  two  constraints 
(m  =  2) .  The  loci  of  g^(x)  =0,  g^(x)  =  0,  the  unconstrained 
maximiim  x(a) ,  and  the  constrained  maximum  x*(a)  (the  heavy  line) 
are  drawn,  as  well  as  certain  features  pertaining  to  the  points  of 
change.  Light  lines  representing  the  projection  of  x(a)  onto  the 
feasible  region  are  also  drawn;  in  view  of  the  circularity  of  the 
level  curves  of  the  objective  function  for  fixed  a,  these  lines  are 
in  the  direction  of  the  gradient  of  the  objective  function  at  x*(a) . 
The  gradients  of  the  constraints  point  into  the  feasible  region. 

From  (=S)a  we  see  that  the  dual  variables  express  minus  the 

s 

gradient  of  the  objective  function  at  x  (a)  as  a  linear  combination 
of  the  gradients  of  the  constraints  in  S.  The  signs  of  u^(a) 

(i  e  S)  are  easily  determined  by  visual  inspection  of  the  figures. 

The  first  example  is  presented  graphically  in  Figure  B.l.  At 
a  =  0  the  unconstrained  maximum  x(0)  is  interior  to  the  feasible 
region.  Thus  the  constrained  maximum  x*(®)  equals  x(0)  and 
Bo  =  0,  which  implies  that  Ao  =  0  since  AQ!  CB3!  for  all  OC. 

We  are  obliged  to  let  S°  =  0,  for  the  empty  set  is  the  only  valid 
set  at  a  =  0  (recall  that  S  is  valid  at  a  if  and  only  if 
Aa C  S  d Ba) .  Step  1  is  complete.  Step  2  demands  that  we  solve 
(=0)ct  as  OC  increases  above  0  until  an  alarm  is  given,  i.e., 
until  x^(a)  leaves  the  feasible  region  or  u^(a)  becomes  negative 
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foi’  some  i.  The  last  alternative  (an  optimality  alarm)  cannot  happen 
for  S°  =  0,  for  (=0)a  requires  u^(q:)  =  0.  Only  the  first  alternative 
(a  feasibility  alarm)  can  occur.  Equations  (=0)a  are  easily  seen 
to  be  the  conditions  for  an  unconstrained  maximum.  Since  x(0)  is 
interior  to  the  feasible  region  for  0  <  a  <  no  alarms  are  given 

on  [0,a^);  (x^(a),  u*^(a))  =  (x*(q:)  ,  u*(q:))  =  {x(a)  ,0)  and  Aa  =  Ba  =  0 
on  [0,0!^).  At  the  unconstrained  maximum  happens  to  be  on  the 

boundary  of  the  feasible  region,  but  beyond  o:^  it  violates  the  first 
constraint,  i.e.  (=0)q:  leads  to  a  feasibility  alarm  for  just 

above  a^.  Thus  is  the  point  of  change  which  completes  Step  2, 

and  (x*^(a^),  u*^(q:^))  =  (x*(q:^),  u*(aj_))  =  (x(a^),0),  Aa^^  =  0, 

=  (l).  Since  <  1,  we  go  to  Step  5-  Two  sets  are  valid  at 

0  and  (l).  The  former  was  seen  at  Step  2  not  to  be  valid 
above  a^,  and  so  the  latter  must  be.  Control  is  now  returned  to 
Step  2  with  S°  =  {l}. 

To  execute  Step  2  for  the  second  time  we  must  solve  (={l})a  as  a 
increases  above  until  an  alarm  obtains.  These  equations  are  the 

conditions  for  a  maximum  of  the  objective  function  subject  to  the 
first  constraint  being  exactly  satisfied.  As  a  increases  above 
q;^,  x^(q!)  moves  along  the  portion  of  the  boundary  determined  by 

the  first  constraint;  since  minus  the  gradient  of  the  objective 
function  at  x^(a)  is  expressed  as  u^(a)  times  the  gradient  of 
g^,  it  is  geometrically  clear  that  u^(a)  grows  increasingly  positive 
as  Ct  increases.  Hence  no  aiarms  are  given  until  CCg  is  passed, 
when  the  second  constraint  begins  to  be  violated.  We  have 
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x^(q!)  =  x*(q!)  ,  u^(q!)  =  u*(a)  >  0,  ^2(0:)  =  u*(a)  =0,  Aa  =  Ba  =  {1} 

on  (a^^Qig) .  Since  <  1  is  the  point  of  change  at  which  Step  2 

is  completed,  we  go  to  Step  5*  Now  Po.^  =  {1}  and  BCiJg  =  [1,2],  so 

that  {1}  and  (1,2)  are  valid  at  since  the  former  was  seen 

not  to  be  valid  Just  above  oc^,  the  latter  must  be.  Control  is 

returned  to  Step  2  again,  this  time  with  S°  =  [1,2} . 

Step  2  now  requires  that  (={1,2J)Q!  be  solved  as  a  increases 

above  oc^  until  an  alarm  occurs.  These  equations  are  the  conditions 

for  a  maximum  of  the  objective  function  subject  to  both  constraints 

being  satisfied  exactly.  Since  the  intersection  of  the  two  equality 

1  2 

constraints  determines  a  unique  point,  x  ’  (a)  is  constant  for  all 

a.  The  projection  lines  of  x(a)  onto  the  feasible  region  and  the 

interpretation  of  the  dual  variables  make  it  clear  that  u  ’  (a)  >  0 

on  =  0,  >  0,  and  u^^^(a)  <0, 

1  2 

Ug'  (a)  >  0  for  a  >  a^.  In  other  words,  an  optimality  alarm  occurs 
for  the  first  constraint  Just  above  a^,  so  that  Step  2  is  complete 
at  that  point  of  change.  Going  to  Step  we  see  that  Aa^  =  [2], 

BCCj  =  {1,2]  •,  since  the  latter  is  not  valid  Just  above  Oi^  the  former 
must  be.  Control  is  returned  to  Step  2  with  S°  =  [2]. 

At  Step  2,  (={2))a  must  be  solved  as  a  increases  above  a^. 

Reasoning  as  before,  we  see  that  {2}  remains  valid  on  [a^,!]. 

Hence  x^(a)  =  x*(a) ,  Pa  =  'Sa  =  [2],  u^(a)  =  0,  and  vl^(oc)  >  0 
on  (ayl]. 

This  completes  the  solution  of  the  first  example.  A  summary 
appears  in  Table  B.  1.  Note  that  there  were  no  false  or  silent  alarms, 
and  no  erroneous  trials  at  any  Step  3- 


123 


The  second  ajid  third  examples  are  presented  graphically  in 
Figures  B.2  and  B. 3-  The  summaries  which  appear  in  the  corresponding 
Tables  B. 2  and  B. 3  can  be  constructed  by  following  the  lines  of  .  , 
reasoning  illustrated  in  the  above  discussion  of  the  first  example. 
Nevertheless,  certain  of  the  entries  are  reasoned  out  below.  The 
second  example  is  designed  to  show  that  false  feasibility  and  silent 
optimality  alarms  can  occur,  the  third  to  show  that  false  optimality 
and  silent  feasibility  alarms  can  occur. 

The  second  example  is  very  much  like  the  first,  except  that  the 
unconstrained  maximum  happens  to  pass  through  the  vertex  of  the  feasible 
region.  At  a  =  a^:  x*(*^l)  =  x(a^) ,  Aa^^  =  0,  and  =  {1,2). 

At  Step  3  one  must  solve  (=S)a  for  a  just  above  a^,  S  valid  at 
a^,  until  a  set  which  is  valid  just  above  is  found.  The  four 

sets  0,  (l),  (2),  and  {1,2}  are  valid  at  a^.  If  one  tries  0, 
it  is  clear  that  x^(a)  =  x(a)  violates  both  constraints  as  CC 
increases  above  a^,  and  also  that  only  {2}  is  valid  just  above 
a^.  Hence  there  is  a  false  feasibility  alarm  for  g^,  for  is 

not  in  the  deficiency  of  0  and  is  not  degenerate  just  above 
See  the  second  line  of  Table  B.2.  If  one  tries  {l},  (={l})a'  are 

the  conditions  for  a  maximum  of  the  objective  function  subject  to  the 
first  constraint  being  exactly  satisfied.  It  is  evident  that  x^(o:) 
violates  the  second  constraint  above  i.e.  a  feasibility 

alarm  for  g^  obtains.  Since  minus  the  gradient  of  the  objective 
function  at  x^(o:)  is  expressed  as  u^(a)  times  the  gradient  of 
g^,  u^(a)  is  seen  to  be  positive  above  a^.  Thus  no  optimality 


124 


alarm  obtains  for  ,  which  means,  in  view  of  the  unique  validity 
of  (2}  above  and  the  fact  that  is  in  the  excess  of  (l} 

just  above  a^,  that  (={l})a  leads  to  a  silent  optimality  alarm. 

See  the  third  line  of  Table  B.2. 

In  the  third  and  last  ex'imple,  the  unconstrained  maximum  again 
happens  to  pass  through  the  vertex  of  the  feasible  region.  At  a  =  a^, 
we  have  x*(‘^q)  =  x(a^),  Act^  =  0,  and  BQ!^  =  {1,2).  The  valid  sets 
at  0!^  are  0,  (l),  (2),  and  {1,2).  The  only  set  which  is  valid 
just  above  is  {2).  If  one  tries  {1)  at  Step  3,  x^(q:) 

evidently  remains  feasible.  Since  g^  is  in  the  deficiency  of  {l) 
just  above  a^,  we  see  that  (={l))a:  leads  to  a  silent  feasibility 
alarm,  as  recorded  in  the  third  line  of  Table  B.  3.  If  one  tries 
{1,2),  X  ^  must  remain  at  the  intersection  of  the  two  equality 

constraints.  It  is  graphically  clear  that  minus  the  gradient  of  the 
objective  function  at  x  («)  =  x*(a^)  is  represented  by  a  negative 
linear  combination  of  the  gradients  of  the  constraints  as  a  increases 
above  a^,  so  that  optimality  alarms  occur  for  both  constraints. 

Since  g^  is  not  in  the  excess  of  {1,2)  and  is  not  degenerate  just 
above  a^,  a  false  optimality  alarm  registers  for  the  second  con¬ 
straint.  See  the  fifth  line  of  Table  B. 3- 
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a 

Valid  Sets 

at  a:  S 

Feasibility  and  Opti¬ 
mality  Alarms  Due  to  ‘ 

S  Just  Above  0! 

Deficiency  and 
Excess  of  S 

Just  Above  a 

Feasibility 

Optimality 

Deficiency 

Excess 

[o,a^) 

0 

1 

1 

1 

1 

1 

1 

— 

— 

izS 

None 

(2] 

None 

CD 

(2} 

(2] 

(1] 

“i 

(2] 

None 

None 

None 

None 

(1,2] 

None 

(1] 

None 

[1] 

1 — 1 

1—1 

(2] 

— 

— 

— 

— 

Table  B. 2 


1/ 

2/ 


False  feasibility  alarm  for 

Silent  optimality  alarm:  no  optimality  alarm  for  u^. 
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VxS2<i*(“l>> 


=  0 


Valid  Sets  Feasibility  and  Opti- 
a  mality  Alarms  Due  to 

at  a:  S  S  Just  Above  CL 


(a^A]  (2) 


Feasibility  I  Optimality  Deficiency  Excess 


0 

{23 

None 

{2] 

None 

(1} 

{1} 

{2] 

{1} 

(2) 

None 

None 

None 

None 

{1,2} 

None 

IBBH 

None 

{1} 

Table  B. 5 

Silent  feasibility  alarm:  no  feasibility  alarm  for  gg. 
False  optimality  alarm  for  u  . 
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vanish  at  x*,  where  f(x*)  =  0.  Then  Newton's  method  is  well-defined 
and  quadratically  convergent  to  x*  if  the  starting  point  x°  is 
in  a  sufficiently  small  neighborhood  of  x*. 

See  Householder  (l953j  P-  136)  for  a  proof. 

Quadratic  convergence  of  the  sequence  <  x^  >  (k  =  0^1^....) 
to  X*  means  that  (here  11*11  denotes  the  Euclidean  norm) 


lim 

k  ->  00 


llx^  -  x*ll 

||x^"^  -  x*||2 


a  constant  ^  0  . 


By  way  of  contrast,  linear  convergence  would  mean  that 


l|x^  -  2S*II  , 

lim  — ^ — = -  =  a  constant  f  0  . 

k  ^  CO  .  x*!| 


Evidently  the  quadratic  convergence  of  Newton' s  method  is  a  highly 
desirable  feature.  The  price  one  pays  for  it  is  the  necessity  of 
evaluating  an  inverse  matrix  at  each  iteration,  and  having  to  have 
a  good  starting  point.  To  ameliorate  the  first  disadvantage,  at  some 
expense  of  speed  of  convergence,  approximate  inverses  can  be  used. 

Often  one  can  achieve  a  substantial  net  gain  in  computational  efficiency 
by  judicious  application  of  this  idea  (see,  for  example,  Ostrowski, 
i960,  and  Householder,  1953j  P-  136). 

For  the  purpose  of  proving  Theorems^.l  and  4.2,  we  find  it  more 
convenient  to  employ 
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Iheorem  C.2  ; 


Let  X*  satisfy  f(x*)  =  0.  Assume  that  there  exists  a  neigh¬ 
borhood  N^(x*)  on  which  the  following  three  assertions  hold: 

(a)  The  functions  f^(x)  (i  =  l,...,n)  are  twice  con¬ 
tinuously  differentiable. 

5(f(x)) 


(b)  The  Jacobian 


^  ^  0. 


[ 


S(x) 


n  n  /  Sf. (x) 

I,?,  P,  ^ 


2-,l/2 


<  L  <  1. 


Then  Newton's  method  (C.2)  is  well-defined  and  quadratically 
convergent  to  x*  if  the  starting  point  x°  is  in  N^(x*) • 

This  theorem  follows  from  results  given  in  Householder  (1953, 
p.  135)  and  Henrici  (1964,  p.  101 ) . 

Remark :  The  square-root  expression  in  (c)  is  an  upper  estimate  of 
the  Euclidean  norm  of  the  Jacobian  matrix  of  F(x)  (see 
Faddeeva,  1959>  P-  121) . 


For  reference  we  record  the  recursion  equation  of  Newton's  method 
applied  to  (=S)a^.  We  have,  for  k  =  0,1,2,... 


(C.3) 


I  \k+l 

/  X  \ 

H  ; 

-1 

i-sj 

\^i 

D  !  0 

1 

^  i 

1  I 

where  H  =^^|f(x;a)  +  ^  u^  S^(x)J  ,  D  is  the  matrix  whose  rows 

s 

are  Y/  g  .  (x)  (i  e  S)  ,  and  u  and  g  are  the  vectors  obtained 

''X  1  — o  — b 
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by  deleting  from  u  and  g(x)  the  components  not  in  S;  all  quan¬ 
tities  on  the  right-hand  side  are  evaluated  at  a  and  (x,u 

o  —  — S 

Note  that  the  equations  u^  =  0  (i  S)^  which  are  a  part  of 
(=S)Q!oJ  have  been  omitted  from  the  recursion  because  they  are  already 
solved. 

In  order  to  have  a  compact  notation  for  the  square-root  expression 
in  (c)  of  Theorem  C.2  specialized  to  (=S)a^,  we  denote  by  A(x,u;  Ci^,S) 
the  square-root  of  the  sum  of  the  squares  of  all  the  elements  of  the 
Jacobian  matrix  of  the  iteration  function  appearing  in  (C.3)  (i.e.^  of 
the  Jacobian  matrix  of  the  right-hand  side  of  (C.J)  considered  as 
a  vector-valued  function  of  (x,u„)). 

—  —5 

2.  Convenient  Partitions  of  the  Inverse  Matrix  Required  by  Newton's 
Method 
Let 

~H  i  1 

r 

D  !  0 

L  I  -J 

be  defined  as  in  (C.5)-  Under  our  conditions,  it  is  easily  verified 
that 


[h  i  dM 

i 

-1 

H  ^  -  H  V(DH“^D^)"^DH"^ 

h"^d^(dh 

_  D  ;  0  _ 

(dh"  V)“^dh“^ 

-(dh“V)‘" 

Let  there  be  s  elements  in  S  (by  Condition  4,  s  <  n) .  The 
inversion  of  the  n+s  by  n+s  matrix  has  been  reduced  to  the  Inversion 
of  two  matrices,  one  n  by  n  (h)  and  the  other  s  by  s  (DH”^D^) , 
and  to  several  matrix  multiplications. 
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Whereas  the  size  of  H  remains  constant  no  matter  what  S  is 


the  dimension  of  DH”^D^  does  vary  with  S,  for  during  Step  3  rows 


are  added  to  and  deleted  from  D  as  S  changes.  It  is  advantageous 
to  use  bordering  methods  to  pass  from  an  available  (DH  ^  to 

the  next  when  S  is  changed  at  Step  3-  We  shall  consider  the  case 
in  which  one  row  is  added  to  "the  bottom"  of  D,  and  also  the  case 
in  which  the  last  row  of  D  is  deleted.  Results  similar  to  the 
following  can  be  derived  to  cover  the  addition  or  deletion  of  an 
arbitrary  row,  and  also  multiple  additions  and/or  deletions. 

If  one  row  d  is  to  be  added  to  D,  then 

-1 


S-1 


;  d*] 


_ L . . 

.dH'^D^  I  dH'^d’^^J 


Qd  dQ 

Qd^ 

dRd"^ 

dQ'^ 

dRd* 

l/dEd"*^ 

where  Q  =  -  ( DH" )  ‘ ^DH  ^  and  R  = 

Fote  that  Q  and  R  are  immediately  available  from  (C.4) 
Let  D  be  written 


and  let  be  written 
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where  T,  is  1  by  1  (i.e.,  T,  is  a  scalar).  If  row  d  is  deleted 

3  3 

from  D,  then 

3.  A  Refinement  Method  for  Approximate  Matrix  Inverses 

Suppose  that  A  is  a  square  matrix  whose  inverse  exists  and  is 
desired  to  be  found,  and  that  an  approximate  inverse  is  available. 

The  error  inherent  in  causes  the  matrix  I-AB^  not  to  vanish. 

If— l|l-AB^||  <  L  <  1,  then  the  recursion 

converges  to  A“^,  and  the  considerable  rapidity  of  the  convergence 
is  apparent  froti  the  estimate 

IIBj^  -  a‘^!I  <  IIb^II  l^V(i-l)  • 

See  Faddeeva  (1959,  pp.  99-102)  for  further  details  on  this 
method,  which  is  due  to  H.  Hotelling. 

It  is  clear  that  this  device  can  be  used  to  great  advantage 
in  maintaining  an  arbitrarily  accurate  approximation  to  H  as  a 
increases  (for  the  elements  of  H,  and  therefore  of  H  are  con¬ 
tinuous  fiinctlons  of  a  on  the  unit  interval)  ,  and  also  to 
It  1 

(DH  D  )  so  long  as  S  stays  the  same. 

y  We  define  the  norm  HaII  of  any  n  by  n  matrix  A  as 
n 

Max  2)  Other  norms  could  be  used,  but  this  one  (the 

1  <  0  <  n  i=l 

so-called  ''p  =  1  norm")  is  particularly  convenient  for  computational 
purposes. 
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s  s 

Formulae  for  d(x  (o:) ,  u  (a))/da 

It  may  "be  shown  by  implicitly  differentiating  (=S)q:  that  the 
following  additional  conclusions  can  he  added  to  Theorem  2:  for 
a  6  IQ!  j 

d(x®(a))/da  =  - 

d(Ug(a))/do:  =  -SJ^f^ix))  , 

where  R  and  Q  are  as  in  section  2  above  and  all  quantities  are 

s  s 

evaluated  at  (x  (a),  u„(a)). 

—  — D 

These  formulae  are  of  possible  interest  for  the  purpose  of 
facilitating  the  convergence  of  Newton's  method,  when  fairly  large 
step  sizes  are  being  used,  by  extrapolating  to  better  starting  points. 
Note  that  R  and  Q  are  immediately  available  from  (C.^). 
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